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(57) Abstract 

The invention provides a nucleotide sequence representing a pathogenicity island found in species of patfiogenic mycobacteria, 
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This invention relates to the novel polynucleotide sequervce we 
5 have designated "GS- which we have identified in pathogenic 
.nycobacteria. GS is a pathogenicity island within Bid, of DNA 
comprising a core region of 5.75kb and an adjacent transmissable 
element within 2.25kb. GS is contained within Myc<^cCerauffl 
paratiiberculoflis. Mycobacterium aviuinsubsp. silvatitrum and some 
10 pathogenic isolates of M. avion.. Functional portions of the core 
region of GS are also represented by regions with a high degree 
of homology that we have identified in cosmids <«ntaining genomic 
DNA from Mycobacterium tuberculosis. 

15 f^r^^^rJ^nuna to fhe invefftipn 

Mycobacterium tuberculosis (Mtb) is a major cause of global 
diseases of humans as well as animals. Although conventional 
methods of diagnosis including microscopy, culture and skin 
testing exist for the recognition of these diseases, improved 
methods particularly new immunodi agnostics and DNA-baeed 
detection systems are needed. Drugs used to treat tuberculosis 
are increasingly encountering the problem of resistant organisms. 
New drugs targeted at specific pathogenicity determinants as well 
as new vaccines for the prevention and treatment of tuberculosis 
are required. The importance of Wtb as a global pathogen is 
reflected in the commitment being made to sequencing the entire 
genome of this organism. This has generated a large amount of 
DNA sequence data of genomic DNA within cosmid and other 
libraries. Although the DNA sequence is known in the art. the 
functions of the vast majority of these sequences, the proteins 
they encode, the biological significance of these proteins, and 
the overall relevance and use of these genes and their products 
as diagnostics, vaccines and targets for chemotherapy for 
tuberculous disease, remains entirely unknown. 

Mycobacterium avium subsp . silvaticvm (Afavs) is a pathogenic 
mycobacterium causing diseases of animals and birds, but it can 
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alBO affect humans. Mycobacterium paratuberculoeie {l^tb) causes 
chronic inflammation of the intestine in many species of animals 
including primates and can also cause Crohn's disease in humans. 
t^tb is associated with other chronic inflammatory diseases of 
humans such as sarcoidosis. Subclinical WJatb infection is 
widespread in domestic livestock and is present in milk from 
infected animals. The organism is more resistant to 
pasteurisation than Mtb and can be conveyed to humans in retail 
milk supplies. Mptb is also present in water supplies, 
particularly those contaminated with run-off from heavily grazed 
pastures . i^tb and Mavs contain the insertion elements 1S900 and 
1S902 respectively, and these are linked to pathogenicity in 
these organisms. 1S900 and 1S902 provide convenient highly 
specific multi-copy DMA targets for the sensitive detection of 
these organisms using DNA- based methods and for the diagnosis of 
infections in animals and humans. Much improvement is however 
required in the immunodiagnosis of ^ptb and Have infections in 
animals and humane. Mptb and Mavs are in general, resistant in 
vivo to standard ant i- tuberculous drugs. Although substantial 
clinical improvements in infections caused by Mptb, such as 
Crohn's disease, may result from treatment of patients with 
combinations of existing drugs such as Rifabutin, Clarithromycin 
or Azithromycin, additional effective drug treatments arc 
required. Furthermore, there is an urgent need for effective 
vaccines for the prevention and treatment of Mptb and Mavs 
infections in animals and humans based upon the recognition -of 
specific pathogenicity determinants. 

Pathogenicity islands are. in general, 7-9kb regions of DNA 
comprising a core domain with multiple ORFs and an adjacent 
transmissable element. The transmissable element also encodes 
proteins which may be linked to pathogenicity, such as by 
providing receptors for cellular recognition. Pathogenicity 
islands are envisaged as mobile packages of DNA which, when they 
enter an organism, assist in bringing about its convertion from 
a non-disease-causing to a disease-causing strain. 
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FiQure 1(a) and (b) shows a linear map of the pathogenicity 
island GS in Mavs (Fig la) and in Mpi± (Fig lb) . The main open 
reading frames are illustrated as ORFs A to H. ORFs A to F are 
found within the core region of GS. ORFs G and H are encoded by 
5 the adjacent transmissable element portion of <5S. 

mr^i-^mir^ invent ign 

using a DNA-based differential analysis technology we have 
discovered and characterised a novel polynucleotide in Mptb 
asolates 0022 from a Guernsey cow and 0021 from a red deer) . 

10 This polynucleotide comprises the gene region we have designated 
GS <3S is found in MptJb using the identifier DNA sequences 
Seq ID No 1 and 2 where the Seq.ID No2 is the complementary 
sequence of Seq.ID No 1. GS is also identified in Mavs. The 
complete DNA sequence incorporating the positive strand of GS 

16 from an isolate of Wavs comprising 7995 nucleotides, including 
the core region of GS and adjacent trans smissable element, is 
given in Seq.ID No. 3. DNA sequence comprising 4435 bp of the 
positive strand of GS obtained from an isolate of i^tb including 
the core region of GS (nucleotides 1614 to 6047 of GS in Mavs) 

20 is given in Seq.ID No 4. The DNA sequence of GS from Mptb is 
highly (99.4%) homologous to GS in Mavs. The remaining portion 
of the DNA sequence of GS in wptb. is readily obtainable by a 
person s)cilled in the art using standard laboratory procedures. 
The entire functional DNA sequence including core region and 

25 transmisable element of GS in /#)tb and Mavs as described above, 
comprise the polynucleotide sequences of the invention. 

Tb-- ere 8 open reading frames (ORFs) in 6S. Six of these 
designated GSA. GSB. GSC, GSD. GSE and GSF are encoded by the 
core DNA region of GS which, characteristically for a 
pathogenicity island, has a different t3C content than the rest 
of the microbial genome. Two ORFs designated GSG and GSH are 
encoded by the transmissable element of GS whose GC content 
resembles that of the rest of the mycobacterial genome. !Che ORF 
GSH comprises two sub-ORFs on the complementary DNA strand 

linked by a programmed f rameshifting site so that a single 
polireptide is translated from the ORF GSH. The nucleotide 
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sequences of the 8 ORFs in GS and their translations are shown 
in Seq. ID No 5 to Seq.ID No 29 as follows: 

ORF A: Seq. ID No 5 Nucleotides 50 to 427 of GS from MavB 

Seq. ID No 6 Amino acid sequence encoded by Seq.ID No 
5. 

ORF B: Seq. ID No 7 Nucleotides 772 to 1605 of GS from Mavs 
Seq. ID No 8 Amino acid sequence encoded by Seq.ID No 
7. 

ORF C: Seq. ID No 9 Nucleotides 1814 to 2845 of GS from Have 
Seq. ID No 10 Amino acid sequence encoded by Seq.ID No 
9. 

Seq. ID No 11 Nucleotides 201 to 1232 of -GS from Mptb 
Seq. ID No 12 Amino acid sequence encoded by Seq.ID No 
11 

ORF D: Seq. ID No 13 Nucleotides 2785 to 3804 of GS from Mave 
Seq. ID No 14 Amino acid sequence encoded by Seq.ID No 
13. 

Seq. ID No 15 Nucleotides 1172 to 2191 of GS from Mptb 
Seq. ID No 16 Amino acid sequence encoded by Seq.ID No 
15. 

ORF E: Seq. ID No 17 Nucleotides 4 080 to 4802 of GS from Maws 
Seq. ID No 18 Amino acid sequence encoded by Seq.ID No 
17. 

Seq. ID No 19 Nucleotides 2467 to 3189 of GS from Mptb 
Seq. ID No 20 Amino acid sequence encoded by Seq.ID No 
19. 

ORF F: Seq. ID No 21 Nucleotides 4947 to 5747 of GS from Mavs 
Seq. ID No 22 Amino acid sequence encoded by Seq.ID No 
21. 

Seq. ID No 23 Nucleotides 3335 to 4135 of GS from Mptb 
Seq. ID No 24 Amino acid sequence encoded by Seq.ID No 
23. 
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ORF G: Seq. ID No 25 Nucleotides 6176 to 7042 of GS iron Jiavs 
Seq. ID No 26 Amino acid sequence encoded by 
Seq. ID No 25. 

ORF H: seq. ID No 27 Nucleotides 7953 to 6215 from Wave. 

ORF H,: seq. ID No 28 Amino acid sequence encoded by 
nucleotides 7953 to 7006 of Seq. ID No 27 

ORF H,: Seq. ID No 29 Amino acid sequence encoded by 
nucleotides 7009 to 6215 of Seq. ID No 27 

The polynucleotides in Mtb with homology to the ORFs B. C. E and 
F of GS in Mptb and Mavs. and the polypeptides they are now known 
to encode as a result of our invention, are as follows: 

Seq. ID No 30 Cosroid MTCy277 nucleotides 35493 to 



ORF B: 



34705 



ORF C: 



seq. ID NO 31 Amino acid sequence encoded by Seq. ID 
No30. 

Seq. ID NO 32 Cosmid MTCY277 nucleotides 31972 to 32994 
Seq. ID No 33 Amino acid sequence encoded by Seq. ID 
No32. 

ORF E: Seq. ID No 34 Cosmid MTCy277 nucleotides 34687 to 33956 
Seq. ID NO 35 Amino acid sequence encoded by Seq. ID 
No34 . 

ORF E: Seq. ID No 36 Cosmid MT024 nucleotides 15934 to 15203 
Seq. ID No 37 Amino acid sequence encoded by Seq. ID 
No36. 

ORF Ft Seq.ID No38 Cosmid MT024 nucleotides 15133 to 14306 
Seq. ID NO 39 Amino acid sequence encoded by Seq.ID 
No3e. 

The proteins and peptides encoded by the ORFs A to H in Mptb and 
Mavs and the amino acid sequences from homologous genes we have 



PCT/GB96/03221 

WO 97/23624 



-6 



discovered in Mtb given in Seq.ID Nos 31, 33, 35, 37 and 39, as 
described above and fragments thereof, con^jrise the polypeptides 
of the invention. The polypeptides of the invention are believed 
to be associated with specific inununoreactivity and with the 
5 pathogenicity of the host micro-organisms from which they were 
obtained . 

The present invention thus provides a polynucleotide in 
substantially isolated forn, which is capable of selectively 
hybridising to sequence ID Nos 3 or 4 or a fragment thereof. The 

10 polynucleotide fragment may alternatively comprise a sequence 
selected from the group of Seq.ID.No: 5, 7, 9. 11, 13, 15, 17, 
19 21 23. 25 and 27. The invention further provides a 
polynucleotide in substantially isolated form whose sequence 
consists essentially of a sequence selected from the group Seq 

15 ID NOS. 30. 32. 34. 36 and 38. or a corresponding sequence 
selectively hybridizable thereto, or a fragment of «aid sequence 
or corresponding sequence. 

The invention further provides diagnostic probes such as a probe 
which comprises a fragment of at least 15 nucleotides of a 
20 polynucleotide of the invention, or a peptide nucleic acxd or 
similar synthetic sequence specific ligand. optionally carrying 
a revealing label. The invention also provides a vector carrying 
a polynucleotide as defined above, particularly an expression 
vector, 

25 The invention further provides a polypeptide in substantially 
isolated form which comprises any one of the sequences selected 
from the group consisting Seq.ID.No: 6. 8. 10. 12, 14. 16, 18. 
20. 22. 24. 26. 28. 29, 31. 33. 35. 37 and 39. or a polypeptide 
substantially homologous thereto. The invention additionally 

30 provides a polypeptide fragment which comprises a fragment of a 
polypeptide defined above, said fragment comprising at least 10 
amino acids and an epitope. The invention also provides 
polynucleotides in substantially isolated f orm^which encode 
polypeptides of the invention, and vectors which comprise such 

35 polynucleotides, as well as antibodies capable of binding *uch 
polypeptides. In an additional aspect, the invention pr<.vides 
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icits comprising polynucleotides, polypeptides, antxbodxes or 
synthetic ligands of the invention and methods of usxng such kits 
in diagnosing the presence or absence of mycobacteria xn a 
sample . The invention also provides pharmaceutical compositions 
5 comprising polynucleotides of the invention, polypeptxdes of the 
invention or antisense probes and the use of such composxtxons 
in the treatment or prevention of diseases caused by 
mycobacteria. The invention also provides polynucleotxhe 
prevention and treatment of infections due to GS-contaxnxng 
10 pathogenic mycobacteria in animals and humans and as a means of 
enhacing in vivo susceptibility of said mycobacteria to 
antimicrobial drugs. The invention also provides bacteria or 
viruses transformed with polynucleotides of the inventxon for use 
as vaccines. The invention further provides Mptb or Mavs xn 
which all or part or the polynucleotides of the invention have 
been deleted or disabled to provide mutated organisms of lower 
pathogenicity for use as vaccines in animals and humans. The 
invention further provides Mtb in which all or part of the 
polynucleotides encoding polypeptides of the invention have been 
deleted or disabled to provided mutated organisms or lower 
pathogenicity for use as vaccines in animals and humans. 

A further aspect of the invention is our discovery of homologies 
between the ORPs B, C and E in GS on the one hand, and Mtb cosmid 
MTCy277 on the other (data from Genbank database using the 
computer programmes BIAST and BLIXEM) . The homologous ORFs xn 
MTCy277 are adjacent to one another consistent with the form of 
another pathogenicity island in Mtb. A further aspect of the 
invention is cur discovery of homologies between ORFb E and F xn 
GS and WtJb cosmid MT024 (also Genbank, as above) with the 
homologous ORFs close to one another. The use of polynucleotides 
and polypeptides from Mtb (Seq. ID Nos 30,31, 32, 33. 34, 35. 36. 
37 38 and 39) in substantially isolated form as diagnostxcs, 
vaccines and targets for chemotherapy, for the management and 
prevention of Mtb infections in humans and animals, and the 
35 processes involved in the preparation and use of these 
diagnostics, vac<:ines and new chemotherapeutic agents, comprxse 
further aspects of the invention. 
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ft, Pnlvny^-l^ntides 

Polynucleotides of the invention as defined herein may comprise 
DNA or RNA. They may also be polynucleotides which include 
within thero synthetic or modified nucleotides or peptide nucleic 
acids. A number of different types of modification to 
oligonucleotides are known in the art. These include 
methylphosphonate and phosphorothioate backbones, addition of 
acridine or polylysine chains at the 3' and/or 5' ends of the 
molecule. For the purposes of the present invention, it is to 
be understood that the polynucleotides described herein may be 
modified by any method available in the art. Such modifications 
may be carried out in order to couple the said polynucleotide to 
a solid phase or to enhance the recognition, the in vivo 
activity, or the lifespan of polynucleotides of the invention. 

A number of different types of polynucleotides of the invention 
are envisaged. In the broadest aspect, polynucleotides and 
fragments thereof capable of hybridizing to SEQ ID NO: 3 or 4 form 
a first aspect of the invention. This includes the 
polynucleotide of SEQ ID NO: 3 or 4 . Within this class of 
polynucleotides various sub-classes of polynucleotides are of 
particular interest. 

One sub-class of polynucleotides which is of interest is the 
class of polynucleotides encoding the open reading frames A, B. 
C, D, E. F, G and H. including SEQ ID N0s:5, 7, 9. 11. 13, 15, 
17, 19, 21, 23, 25 and 27. As discussed below, polynucleotides 
encoding ORF H include the polynucleotide sequences 7953 to 7006 
and 7009 to 6215 within SEQ ID NO: 27, as well as modified 
sequences in which the frame-shift has been modified so that the 
two sub-reading frames are placed in a single reading frame. 
This may be desirable where the polypeptide is to be produced in 
recombinant eaqpression systems. 

The invention thus provides a polynucleotide in substantially 
isolated form which encodes any one of these ORFs or combinations 
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thereof. Combinations thereof includes corabinations of 2, 3. 4, 
5 or all of the ORFs. Polynucleotides may be provided vhich 
comprise an individual ORF carried in a recombinant vector 
including the vectors described herein. Thus in one preferred 

5 aspect the invention provides a polynucleotide in substantially 
isolated form capable of selectively hybridizing to the nucleic 
acid comprising ORFs A to F of the core region of the Mptb and 
Mavs pathogenicity islands of the invention. Fragments thereof 
corresponding to ORFs A to E. B to F, A to D, B to E, A to C, B 

10 to D or any two adjacent ORFs are also included in the invention. 

Polynucleotides of the invention will be capable of selectively 
hybridizing to the corresponding portion of the GS region, or to 
the corresponding ORFs of Mtb described herein. The term 
-selectively hybridizing- indicates that the polynucleotides will 

15 hybridize, under conditions of medium to high stringency (for 
eatample 0.03 M sodium chloride and 0.03 M sodium citrate at from 
about 50OC to about 60oC) to the corresponding portion of SEQ ID 
NO: 3 or 4 or the complementary strands thereof but not to genomic 
DNA from mycobacteria which are usually non- pathogenic including 

20 non-pathogenic species of W.aviuni. Such polynucleotides will 
generally be generally at least 68%, e.g. at least 70*, 
preferably at least 80 or 90% and more preferably at least 95% 
homologous to the corresponding DNA of GS. The corresponding 
portion will be of over a region of at least 20, preferably at 

25 least 30, for instance at least 40, 60 or 100 or more contiguous 
nucleotides . 

By "corresponding portion" it is meant a sequence from the GS 
region of the same or substantially similar size which has been 
determined, for example by computer alignment, to have the 
30 greatest degree of homology to the polynucleotide. 

Any combination of the above mentioned degrees of homology and 
minimum sizes may be used to define polynucleotides of the 
invention, with the more stringent combinations (i.e. higher 
homology over longer lengths) being preferred. Thus for example 
35 a polynucleotide which is at least 80% homologous over 25, 
preferably 30 nucleotides forms one aspect of the invention, as 
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does a polynucleotide which is at least 90% homologous over 40 
nucleotides. 

A further class of polynucleotides of the invention iB the class 
of polynucleotides encoding polypeptides of the invention, the 
polypeptides of the invention being defined in section B below. 
Due to the redundancy of the genetic code as such, 
polynucleotides may be of a lower degree of honiology than 
required for selective hybridization to the GS region. However, 
when such polynucleotides encode polypeptides of the invention 
these polynucleotides form a further aspect. It may for example 
be desirable where polypeptides of the invention are produced 
recombinantly to increase the GC content of such polynucleotades. 
This increase in GC content may result in higher levels of 
expression via codon usage more appropriate to the host cell in 
which recombinant expression is taking place. 

An additional class of polynucleotides of the invention are those 
obtainable from cosmids MTCy277 and MT024 (containing Mtb genomic 
sequences), which polynucleotides consist essentially of the 
fragment of the cosmid containing an open reading frame encoding 
any one of the homologous ORFs B. C. E or F respectively. Such 
polynucleotides are referred to below as Mtb polynucleotides. 
However, where reference is made to polynucleotides in general 
such reference includes Mtb polynucleotides unless the context 
is explicitly to the contrary. m addition, the invention 
provides polynucleotides which encode the same polypeptide as the 
abovementioned ORFs of Mtb but which, due to the redundancy of 
the genetic code, have different nucleotide sequences. These 
form further Mtb polynucleotides of the invention. Fragments of 
Mtb polynucleotides suitable for use as probes or primers also 
form a further aspect of the invention. 

The invention further provides polynucleotides in substantially 
isolated form capable of selectively hybridizing (where 
selectively hybridizing is as defined above) to the Mtb 
polynucleotides of the invention. 
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The invention further provides the Mtb polynucleotides of the 
invention linked. at either the 5- and/or 3' end to 
polynucleotide sequences to which they are not naturally 
contiguous. Such sequences will typically be sequences found in 
cloning or expression vectors, such as promoters. 5' untranslated 
sequence, 3' untranslated sequence or termination sequences. The 
sequences may also include further coding sequences such as 
signal sequences used in recombinant production of proteins. 

Further polynucleotides of the invention are illustrated in the 
accompanying examples. 

Polynucleotides of the invention may be used to produce a primer, 
e g a PGR primer, a primer for an alternative amplification 
reaction, a probe e.g. labelled with a revealing label by 
conventional means using radioactive or non- radioactive labels 
or a probe linked covalently to a solid phase, or the 
polynucleotides may be cloned into vectors. Such primers, 
probes and other fragments will be at least 15. preferably at 
least 20. for example at least 25. 30 or 40 or more nucleotides 
in length, and are also encompassed by the term polynucleotides 
of the invention as used herein. 

Primers of the invention which are preferred include primers 
directed to any part of the ORFs defined herein. The ORFs from 
other isolates of pathogenic mycobacteria which contain a -GS 
region may be determined and conserved regions within each 
individual ORF may be identified. Primers directed to such 
conserved regions form a further preferred aspect of the 
invention. In addition, the primers and other polynucleotides 
of the invention may be used to identify, obtain and isolate ORFs 
capable of selectively hybridizing to the polynucleotides of the 
invention which are present in pathogenic mycobacteria but which 
are not part of a pathogenicity island in that particular species 
of bacteria. Thus in addition to the ORFs B. C. E and F which 
have been identified in Mtt, similar ORFs may be identified in 
other pathogens and ORFs corresponding to the GS ORFs C. D. E, 
F and H, may also be identified. 
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polynucleotides such as DNA polynucleotides and probes according 
to the invention may be produced recombinantly, synthetically, 
or by any means available to those of skill in the art. They may 
also be cloned by standard techniques. 

5 In general, primers will be produced by synthetic means, 
involving a step-wise manufacture of the desired nucleic acid 
sequence one nucleotide at a time. Techniques for accomplishing 
this using automated techniques are readily available in the art. 
Longer polynucleotides will generally be produced using 

10 recondJinant means, for example using a PCR (polymerase chain 
reaction) cloning techniques. This will involve making a pair 
or primers {e.g. of about 15-30 nucleotides) to a region of GS, 
which it is desired to clone, bringing the primers into contact 
with genomic DNA from a mycobacteriuro or a vector carrying the 

15 GS sequence, performing a polymerase chain reaction under 
conditions which bring about amplification of the desired region, 
isolating the amplified fragment (e.g. by purifying the reaction 
mixture on an agarose gel) and recovering the amplified DNA. The 
primers may be designed to contain suitable restriction enzyme 

20 recognition sites so that the amplified DNA can be cloned into 
a suitable cloning vector. 

such techniques may be used to obtain all or part of the GS or 
ORF sequences described herein, as well as further genomic clones 
containing full open reading frames. Although in general such 
25 techniques are well known in the art, reference may be made in 
particular to Sambrook J.. Fritsch EF.. Maniatis T (1989). 
Molecular cloning: a Laboratory Manual, 2nd edn. Cold Spring 
Harbor. New York. Cold Spring Harbor Laboratory. 

Polynucleotides which are not 100% homologous to the sequences 
30 of the present invention but fall within the scope of the 
invention can be obtained in a number of ways. 

Other isolates or strains of pathogenic mycobacteria will be 
expected to contain allelic variants of the GS sequences 
described herein, and these may be obtained for example by 
35 probing genomic DNA libraries made from such isolates or strains 
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Of bacteria using GS or ORF sequences as probes under condxtxons 
of medium to high stringency (for exan^le 0.03M sodium chlor.de 
and 0.03M sodium citrate at from about 50«>C to about 60 C) . 

A particularly preferred group of pathogenic mycobacteria are 
isolates of M.paratuberculosis. Polynucleotides based on GS 
regions from such bacteria are particularly preferred. Preferred 
fragments of such regions include fragments encoding individual 
open reading frames including the preferred groups and 
combinations of open reading frames discussed above. 

Alternatively, such polynucleotides may be obtained by site 
directed mutagenesis of the GS or ORF sequences or allelic 
variants thereof. This may be useful where for example sxlent 
codon changes are required to sequences to optimise codon 
preferences for a particular host cell in which the 
polynucleotide sequences are being expressed. Other sequence 
changes may be desired in order to introduce restriction enzyme 
recognition sites, or to alter the property or function of the 
polypeptides encoded by the polynucleotides of the invention 
such altered property or function will include the addition of 
amino acid sequences of consensus signal peptides known in the 
art to effect transport and secretion of the modified polypeptide 
of the invention. Another altered property will include 
metagenesis of a catalytic residue or generation of fusion 
proteins with another polypeptide. Such fusion proteins may be 
with an enzyme, with an antibody or with a cytokine or other 
ligand for a receptor, to target a polypeptide of the invention 
to a specific cell type in vitro or in vivo. 

The invention further provides double stranded polynucleotides 
comprising a polynucleotide of the invention and its complement. 

, polynucleotides or primers of the invention may carry a revealing 
label suitable labels include radioisotopes such as P or S, 
enzyme labels, other protein labels or smaller labels such as 
biotin or fluorophores. Such labels may be added to 
polynucleotides or primers of the invention and may be detected 

5 using by techniques known per se. 
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polynucleotides or primers of the invention or fragment* thereof 
labelled or unlabelled may be used by a person skilled xn the art 
in nucleic acid-based tests for the presence or absence of Wptb 
Mavs, other GS-containing pathogenic mycobacteria, or Mtb applied 
to samples of body fluids, tissues, or excreta from anxmals and 
humans, as well as to food and environmental samples such as 
river or ground water and domestic water supplies. 

Human and animal body fluids include sputum, blood, serum, 
plasma, saliva, milk, urine, csf. semen, faeces a^*-^^^ 
discharges. Tissues include intestine, mouth ulcers, skin, lymph 
nodes, spleen, lung and liver obtained surgically or by a biopsy 
technique. Animals particularly include commercial livestock 
such as cattle, sheep, goats, deer, rabbits but wild animals and 
animals in zoos may also be tested. 

such tests comprise bringing a human or animal body fluid or 
tissue extract, or an extract of an environmental or food sample, 
ilto contact with a probe comprising a polynucleotide or primer 
of the invention under hybridising conditions and detecting any 
duplex formed between the probe and nucleic acid -^^^ 
such detection may be achieved using techniques such as PCR or 
by in-bilising the probe on a solid support . removing nuc eic 
acid in the sample which is not hybridized to the probe and then 
de acting nucleic acid which has hybridized to the probe^ 
Alternatively, the sample nucleic acid may be immobilized on a 
solid support, and the amount of probe bound to such a support 
r..n be detected. Suitable assay methods of this any other 
Trmats cante found in for example .OB^/03SS1 and WOS0/13.S7. 

Polynucleotides of the invention or fragments thereof labelled 
or mxlabelled may also be used to identify and characterise 
different strains of Mptb. Mavs. other GS-containing pathogenic 
mycobacteria, or Mtb. and properties such as drug resistance or 
BUBceptibility- 

The probes of the invention may conveniently be packaged in the 
form of a test kit in a suitable container. In ^uch kits the 
i probe may be bound to a solid support where the assay format for 
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which the kit is designed requires such binding. The kit may 
also contain suitable reagents for treating the sample to be 
probed, hybridising the probe to nucleic acid in the sanple. 
control reagents, instructions, and the like. 

The use of polynucleotides of the invention in the diagnoeis of 
inflammatory diseases such as Crohn's disease or sarcoidoexe xn 
humans or Johne's disease in animals form a preferred aspect of 
the invention. The polynucleotides may also be used xn the 
prognosis of these diseases. For example, the response of a 
human or animal subject in response to antibiotic, vaccination 
or other therapies may be monitored by utilizing the diagnostic 
methods of the invention over the course of a period of treatment 
and following such treatment. 

The use of Mtb polynucleotides (particularly in the form of 
probes and primers) of the invention in the above -described 
methods form a further aspect of the invention, particularly for 
the detection, diagnosis or prognosis of Mtb infections. 



p, Pol vnept ides. 

Polypeptides of the invention include polypeptides in 
substantially isolated form encoded by GS. This includes the 
full length polypeptides encoded by the positive and 
complementary negative strands of GS. Each of the full length 
polypeptides will contain one of the amino acid sequences set out 
in Seq ID N08:6. 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28 and 
29 Polypeptides of the invention further include variants of 
such sequences, including naturally occurring allelic variants 
and synthetic variants which are substantially homologous to eaxd 
polypeptides. In this context, substantial homology is regarded 
as a sequence which has at least 70%, e.g. 80%. 90%. 95% or 98% 
amino acid homology (identity) over 30 or more, e.g 40. SO or 100 
amino acids. For example, one group of substantially iiomolgous 
polypeptides are those which have at least 95% amino aci^ 
identity to a polypeptide of any one of Seq ID N0b:6. 8. 10, 12, 
14, 16. 18, 20, 22, 24, 26, 28 and 29 over their entire length. 
Even more preferably, this homology is 98%. 
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Polypeptides of the invention further include the polypeptide 
sequences of the homologous ORFs of Mtb. namely Seq ID Nos. 31. 
33 35, 37 and 39. Unless explicitly specified to the contrary, 
reference to polypeptides of the invention and their fragments 
include these Mtb polypeptides and fragments, and variants 
thereof (substanially homologous to eaid sequences) as defined 



20 



herein. 



25 



Polypeptides of the invention may be obtained by the standard 
techniques mentioned above. Polypeptides of the invention also 
include fragments of the above mentioned full length polypeptides 
and variants thereof, including fragments of the sequences set 
out in SEQ IDN0S:6. 8. 10. 12. 14. 16. 18. 20, 22, 24. 26, 28. 
29 31 33. 35. 37 and 39. Such fragments for example of 8, 10, 
12' 15 or' up to 30 or 40 amino acids may also be obtained 
synthetically using standard techniques knovm in the art. 

Preferred fragments include those which include an epitope, 
especially an epitope which is specific to the pathogenicity of 
the mycobacterial cell from which the polypeptide is derived 
Suitable fragments will be at least about 5. e.g. 8. 10. 12. 15 
or 20 amino acids in size, or larger. Epitopes may be determined 
either by techniques such as peptide scanning techniques as 
described by CSeysen et al. Mol . Immunol . . 21; 709-715 (1986). as 
well as other techniques known in the art. 

The term "an epitope which is specific to the pathogenicity of 
the mycobacterial cell- means that the epitope is encoded by a 
portion of the GS region, or by the corresponding ORF sequences 
of Mtb which can be used to distinguish mycobacteria which ar« 
pathogenic by from related non- pathogenic mycobacteria including 
non-pathogenic species of M.avium. This may be determined using 
routine methodology. A candidate epitope from an ORF may be 
prepared and used to immunise an animal such as a rat or rabbit 
in order to generate antibodies. The antibodies may then be used 
to detect the presence of the epitope in pathogenic mycobacteria 
and to confirm that non-pathogenic mycobacteria do not contain 
any proteins which react with the epitope. Epitopes may be 
linear or confoarmational. 
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polypeptides of the invention may be in a substantially isolated 
form It will be understood that the polypeptide may be mixed 
with carriers or diluents which will not interfere with the 
intended purpose of the polypeptide and still be regarded as 
substantially isolated. A polypeptide of the invention may also 
be in a substantially purified form, in which case it will 
generally comprise the polypeptide in a preparation in which more 
than 90%. e.g. 95%. 98% or 99% of the polypeptide in the 
preparation is a polypeptide of the invention. 

polypeptides of the invention may be modified to confer a desired 
property or function for example by the addition of Histidine 
residues to assist their purification or by the addition of a 
signal sequence to promote their secretion from a cell. 

Thus, polypeptides of the invention include fusion proteins which 
comprise a polypeptide encoding all or part of one or more of an 
ORF of the invention fused at the N- or C-terminus to a second 
sequence to provide the desired property or function. Sequences 
which promote secretion from a cell include, for example the 
yeast a- factor signal sequence. 

A polypeptide of the invention may be labelled with a revealing 
label. The revealing label may be any suitable label which 
allows the polypeptide to be detected. Suitable labels include 
radioisotopes, e.g. "^I, '"S enzymes, antibodies, polynucleotides 
and ligands such as biotin. Labelled polypeptides of the 
invention may be used in diagnostic procedures such as 
immunoassays in order to determine the amount of a polypeptide 
of the invention in a sample. Polypeptides or labelled 
polypeptides of the invention may also be used in serological or 
cell mediated immune assays for the detection of immune 
reactivity to said polypeptides in animals and humans using 
standard protocols. 



A 



35 



polypeptide or labelled polypeptide of the invention or 
fragment thereof may also be fixed to a solid phase, for example 
the surface of an immunoassay well, microparticle, dipstick or 
biosensor. Such labelled and/or immobilized polypeptides may be 
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packaged into kits in a suitable container along with suitable 
reagents, controls, instructions and the like. 

Such polypeptides and kits may be used in methods of detection 
of antibodies or cell mediated immunoreactivity, to the 
mycobacterial proteins and peptides encoded by the ORFs of the 
invention and their allelic variants and fragments, using 
immunoassay. Such host antibodies or cell mediated immune 
reactivity will occur in humans or animals with an immune system 
which detects and reacts against polypeptides of the invention. 
The antibodies may be present in a biological sample from such 
humans or animals, where the biological sample may be a sample 
as defined above particularly blood, milk or saliva. 

Immunoassay methods are well known in the art and will generally 
comprise; 

(a) providing a polypeptide of the invention comprising an 
epitope bindable by an antibody against said 
mycobacterial polypeptide; 

(b) incubating a biological sample with said polypeptide 
under conditions which allow for the formation of an 
antibody-antigen complex; and 

(c) determining whether antibody-antigen complex 
comprising said polypeptide is formed. 

immunoassay methods for cell mediated immune reactivity in 
animals and humans are also well known in the art (e.g. as 
described by Weir et al 1994, J. Immunol Methods i7£; 93-101) and 
will generally comprise 

(a) providing a polypeptide of the invention comprising an 
epitope bindable by a lymphocyte or macrophage or 
other cell receptor; 

(b) incubating a cell sample with said polypeptide under 
conditions which allow for a cellular immune response 
such as release of cytokines or other mediator to 
occur ; and 

(c) detecting the presence of said cytokine or mediator in 
the incubate. 
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Polypepcide. of the invention may be .»de by ««dard synthetic 
Iea» «U known in the art or reco»binantly, a, deacr^bed below. 

Polypeptides o£ the invention or fragments thereof labelled or 
^l^lled may also be used to identify and characterx.e 
different strains of l^tb. «avs, other GS-containing pathogenic 
mycobacteria, or Mtb, and properties such as drug resistance or 
susceptibility. 

The polypeptides of the invention may conveniently be V^^f^^^d 
in the form of a test kit in a suitable container. In such kite 
the polypeptide may be bound to a solid support where the aseay 
format for which the kit is designed requires such binding. The 
Kit may also contain suitable reagents for ^--^^f ^J^^^^f'" 
to be examined, control reagents, instructions, and the like. 

The use of polypeptides of the invention in the diagnosie of 
inflammatory diseases such as Crohn's disease or -r.oidosis in 
humans or Johne's disease in animals form a preferred aspect of 
the invention. The polypeptides may also be used in the 
prognosis of these diseases. For example, the response of a 
human or animal subject in response to antibiotic or other 
therapies may be monitored by utilizing the diagnostic methods 
of the invention over the course of a period of treatment and 
following such treatment. 

The use of Mtb polypeptides of the invention in the above- 
described methods foron a further aspect of the invention 
particularly for the detection, diagnosis or prognosis of Mtb 
infections. 

Polypeptides of the invention may also be used in assay methods 
for identifying candidate chemical compounds which will be useful 
in inhibiting, binding to or disrupting the function of said 
, polypeptides required for pathogenicity. In general, such assays 
involve bringing the polypeptide into contact with aca«iidate 
inhibitor compound and observing the ability of the compound to 
disrupt, bind to or interfer with the polypeptide. 
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There are a number of ways in which the assay may be formatted. 
For example, those polypeptides which have an enzymatic function 
may be assayed using labelled substrates for the enzyme, and the 
amount of, or rate of, conversion of the substrate into a product 
measured, e.g by chromatograpy such as HPLC or by a colourimetric 
aesay. Suitable labels include »S, biotin or enzymes such 

as horse radish peroxidase. 

For example, the gene product of ORF C is believed to have GDP- 
mannose dehydratase activty. Thus an assay for inW^itors of the 
gene product may utilise for example labelled GDP-mannose. GDP 
or mannose and the activity of the gene product followed. ORF 
D encodes a gene related to the synthesis and regulation of 
capuslar polysaccharides, which are often associated with 
invasiveness and pathogenicity. Labelled polysaccharide 
substrates may be used in assays of the ORF D gene product. The 
gene product of ORF F encodes a protein with putative glucosyl 
transferase activity and thus labelled amino sugars such as ^-1- 
3-N-acetylgluco8amine may be used as substrates in assays. 

candidate chemical compounds which may be used may be natural or 
synthetic chemical compounds used in drug screening programmes. 
Extracts of plants which contain several characterised or 
uncharacterised components may also be used. 

Alternatively, the a polypeptide of the invention may be screened 
against a panel of peptides, nucleic acids or other chemical 
functionalities which are generated by combinatorial chemistry. 
This will allow the definition of chemical entities which bind 
to polypeptides of the invention. Typically, the polypeptide of 
the invention will be brought into contact with a panel of 
compounds from a combinantorial library, with either the panel 
or the polypeptide being immobilized on a solid phase, under 
conditions suitable for the polypeptide to bind to the panel . 
The solid phase will then be washed under conditions in which 
only specific interactions between the polypeptide and individual 
members of the panel are retained, and those specific members may 
be utilized in further assays or used to design further panels 
of candidate compounds. 
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For exaniple. a nurther of assay methods to define peptide 
interaction with peptides are knovm. For example. WOB6/00991 
describes a method for determining mimotopes which comprxses 
making panels of catamer preparations, for example octamers of 
amino acids, at which one or more of the positions is defined and 
the remaining positions are randomly made up of other amino 
acids, determining which catamer binds to a protein of interest 
and re-screening the protein of interest against a further panel 
based on the most reactive catamer in which one or more 
additional designated positions are systematically varied. This 
may be repeated throughout a number of cycles and used to build 
up a sequence of a binding candidate compound of interest. 

WO89/03430 describes screening methods which permit the 
preparation of specific mimotopes which mimic the immunological 
activity of a desired analyte. These mimotopes are identified 
by reacting a panel of individual peptides wherein said peptides 
are of systematically varying hydrophobicity. amphipathic 
characteristics and charge patterns, using an antibody against 
an antigen of interest. Thus in the present case antibodies 
against the a polypeptide of the inventoin may be employed and 
mimotope peptides from such panels may be identified. 
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Polynucleotides of the invention can be incorporated into a 
recombinant replicable vector. The vector may be used to 
replicate the nucleic acid in a compatible host cell. Thus in 
a further embodiment, the invention provides a method of making 
polynucleotides of the invention by introducing a polynucleotide 
of the invention into a replicable vector, introducing the vector 
into a coir«>atible host cell, and growing the host cell under 
conditions which bring about replication of the vector. The 
vector may be recovered from the host cell. Suitable host cells 
are described below in connection with expression vectors. 



p. Expression V ectors 
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Preferably, a polynucleotide of the invention in a vector is 
operably linked to a control sequence which is capable of 
providing for the expression of the coding sequence by the host 
cell, i.e. the vector is an expression vector. The term "operably 

S linked" refers to a juxtaposition wherein the components 
described are in a relationship permitting them to function in 
their intended manner. A control sequence "operably linked" to 
a coding sequence is ligated in such a way that expression of the 
coding sequence is achieved under conditions compatible with the 

10 control sequences. Such vectors may be transformed into a 
suitable host cell as described above to provide for expression 
of a polypeptide of the invention. Thus, in a further aspect the 
invention provides a process for preparing polypeptides according 
to the invention which comprises cultivating a host cell 

15 transformed or transf ected with an expression vector as described 
above, under conditions to provide for expression by the vector 
of a coding sequence encoding the polypeptides, and recovering 
the expressed polypeptides. 

A further embodiment of the invention provides vectors for the 
20 replication and expression of polynucleotides of the invention, 
or fragments thereof. The vectors may be for example, plasmid, 
virus or phage vectors provided with an origin of replication, 
optionally a promoter for the expression of the said 
polynucleotide and optionally a regulator of the promoter. The 

25 vectors may contain one or more selectable marker genes, for 
example an ampicillin resistance gene in the case of a bacterial 
plasmid or a neomycin resistance gene for a mammalian vector. 
Vectors may be used in vitro, for example for the production of 
RNA or used to transf ect or transform a host cell. The vector 

30 may also be adapted to be used in vivo, for example in a method 
of naked DNA vaccination or gene therapy. A further embodiment 
of the invention provides host cells transformed or transfected 
with the vectors for the replication and expression of 
polynucleotides of the invention, including the DNA of ^S, the 

35 open reading frames thereof and other corresponding ORFs 
particularly ORPs B, C. E and F from Afth. The cells will be 
chosen to be compatible with the said vector and may for example 
be bacterial, yeast, insect or mammalian. 
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ExpreBBion vectors are widely available in the art and can be 
obtained conunercially. Mammalian expression vectors may comprise 
a mammalian or viral promoter. Mammalian promoters include the 
metallothionien promoter. Viral promoters include promoters from 
adenovirus, the SV40 large T promoter and retroviral LTR 
promoters. Promoters compatible with insect cells include the 
polyhedrin promoter. Yeast promoters include the alcohol 
dehydrogenase promoter. Bacterial promoters include the 
/J-galactosidase promoter. 

The expression vectors may also comprise enhancers, and in the 
case of eukaryotic vectors polyadenylation signal sequence 
downstream of the coding sequence being expressed. 

Polypeptides of the invention may be expressed in suitable host 
cells, for example bacterial, yeast, plant, insect and mammalian 
cells, and recovered using standard purification techniques 
including, for example affinity chromatography, HPLC or other 
chromatographic separation techniques. 

Polynucleotides according to the invention may also be inserted 
into the vectors described above in an antisense orientation in 
order to provide for the production of antisense RNA. Antisense 
RNA or other antisense polynucleotides or ligands may also be 
produced by synthetic means. Such antisense polynucleotides may 
be used in a method of controlling the levels of the proteins 
encoded by the ORFs of the invention in a mycobacterial cell. 

Polynucleotides of the invention may also be carried by vectors 
suitable for gene therapy methods. Such gene therapy methods 
include those designed to provide vaccination against diseases 
caused by pathogenic mycobacteria or to boost the immune response 
of a human or animal infected with a pathogenic mycobacteria. 

For example. Ziegner et al. AIDS. 1995. l;43-50 describes the use 
of a replication defective recombinant amphotropi<: retrovirus to 
boost the immune response in patients with HIV infection. Such 
a retrovirus may be modified to carry a polynucleotide encoding 
a polypeptide or fragment thereof of the invention and the 
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retrovirus delivered to the cells of a human or animal subject 
in order to provide an immune response against said polypeptide. 
The retrovirus may be delivered directly to the patient or may 
be used to infecte cells ex- vivo. e.g. fibroblast cells, which 
are then introduced into the patient, optionally after being 
inactivated. The cells are desirably autologous or HIA-matched 
cells from the human or animal subject. 

Gene therapy methods including methods for boosting an immune 
response to a particluar pathogen are disclosed generally in for 
example WO95/14091. the disclosure of which is incoporated herexn 
by reference. Recombinant viral vectors include retroviral 
vectors, adenoviral vectors, adeno-associated viral vectors . 
vaccinia virus vectors, herpes virus vectors and alphavxrus 
vectors. Alpha virus vectors are described in, for example. 
WO95/07994. the disclosure of which is incorporated herein by 
reference. 

Where direct administration of the recombinant viral vector is 
contemplated, either in the form of naked nucleic acid or in the 
fo^o^ packaged particles carrying the nucleic acid this may be 

20 done by any suitable means, for example oral administration or 
intravenous injection. From 10^ to 10- c.f .u of virus represents 
a typical dose, which may be repeated for example weekly over a 
period of a few months. Administration of autologous or HIA- 
Ltched cells infected with the virus may be more convenient m 

25 some cases. This will generally be achieved by administering 
doses, for example from 10^ to 10' cells per dose which may be 
repeated as described above. 

The recombinant viral vector may further comprise nucleic acid 
capable of expressing an accessory molecule of the immune system 

30 designed to increase the immune response. Such a mol^clue may 
be for example and interferon, particularly interferon gamma an 
interleukin. for example IL-la. lL-1/3 or IL-2. or an ^^^^^ 
I or II moleclue. This may be particularly desirable where the 
vector is intended for use in the treatment of humans or animals 

35 already infected with a mycobacteria and it is desired to ^post 
the immune response. 
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The invention also provides monoclonal or polyclonal antibodies 
to polypeptides of the invention or fragments thereof. The 
invention further provides a process for the production of 
monoclonal or polyclonal antibodies to polypeptides of the 
invention. Monoclonal antibodies may be prepared by conventional 
hybridoma technology using the polypeptides of the invention or 
peptide fragments thereof, as immunogens. Polyclonal antibodies 
may also be prepared by conventional means which comprise 
inoculating a host animal, for exan?>le a rat or a rabbit, with 
a polypeptide of the invention or peptide fragment thereof and 
recovering immune serum. 

In order that such antibodies may be made, the invention also 
provides polypeptides of the invention or fragments thereof 
haptenised to another polypeptide for use as immunogens in 
animals or humans. 

For the purposes of this invention, the term "antibody", unless 
specified to the contrary, includes fragments of whole antibodies 
which retain their binding activity for a polypeptide of the 
invention. Such fragments include Fv. F(ab') and F{ab')j 
fragments, as well as single chain antibodies. Furthermore, the 
antibodies and fragments thereof may be humanised antibodies, 
e.g. as described in EP-A-239400. 

Antibodies may be used in methods of detecting polypeptides of 
the invention present in biological samples (where such samples 
include the human or animal body eampleB, and environmental 
samples, mentioned above) by a method which comprises: 
(a) providing an antibody of the invention; 

incubating a biological sample with said antibody 
under conditions which allow for the formation of an 
antibody -antigen con?)lex; and 

determining whether antibody-antigen complex 
comprising said antibody is formed. 



(b) 
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Antibodies of the invention may be bound to a solid support for 
example an immunoaBsay well, microparticle. dipstick or biosensor 
and/or packaged into kits in a suitable container along wath 
suitable reagents, controls, instructions and the like. 

Antibodies of the invention may be used in the detection, 
diagnosis and prognosis of diseases as descirbed above in 
relation to polypeptides of the invention. 

p nnmpnsltions. 

The present invention also provides compositions coir5)riBing a 
polynucleotide or polypeptide of the invention together with a 
carrier or diluent. Compositions of the invention also include 
compositions comprising a nucleic acid, particularly and 
expression vector, of the invention. Compositions further 
include those carrying a recombinant virus of the invention, 
such compositions include pharmaceutical compositions m which 
case the carrier or diluent will be pharmaceutical ly acceptable. 

Pharmaceutically acceptable carriers or diluents include those 
used in formulations suitable for inhalation as well as oral, 
parenteral (e.g. intramuscular or intravenous or transcutaneous) 

20 administration. The formulations may conveniently be presented 
in unit dosage form and may be prepared by any of the methods 
well known in the art of pharmacy. Such methods include the step 
of bringing into association the active ingredient with the 
carrier which constitutes one or more accessory ingredients. In 

25 general the formulations are prepared by uniformly and intimately 
bringing into association the active ingredient with liquid 
carriers or finely divided solid carriers or both, and then, if 
necessary, shaping the product. 

For example, formulations suitable for parenteral administration 
30 include aqueous and non-aqueous sterile injection solutions which 
may contain anti -oxidants, buffers, bacteriostats and solutes 
which render the formulation isotonic with the blood of the 
intended recipient. and aqueous and non-aqueous sterile 
suspensions which may include suspending agents and thickening 
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agents and liposomes or other microparticulate systems which are 
designed to target the polynucleotide or the polypeptide of the 
invention to blood components or one or more organs, or to target 
cells such as M cells of the intestine after oral admini«trati-on. 

5 Vaccines. 

in another aspect, the invention provides novel vaccines for the 
prevention and treatment of infections caused by Mptb, Mave, 
other GS-containing pathogenic mycobacteria and Mtb in animale 
and humans. The term "vaccine" as used herein means an agent 

10 used to stimulate the immune system of a vertebrate, particularly 
a warm blooded vertebrate including humans, so as to provide 
protection against future harm by an organism to whioh the 
vaccine is directed or to assist in the eradication of an 
organism in the treatment of established infection. The immune 

15 system will be stimulated by the production of cellular immunity 
antibodies, desirably neutralizing antibodies, directed to 
epitopes found on or in a pathogenic mycobacterium which 
expresses any one of the ORFs of the invention. The antibody so 
produced may be any of the immunological classes, such as the 

20 immunoglobulins A. D. E. G or M. Vaccines which stimulate the 
production of IgA are interest since this is the principle 
immunoglobulin produced by the secretory system of warm-blooded 
animals, and the production of such antibodies will help prevent 
infection or colonization of the intestinal tract. However an 

25 IgM and IgG response will also be desirable for systemic 
infections such as Crohn's disease or tuberoulosis. 

Vaccines of the invention include polynucleotides of the 
invention or fragments thereof in suitable vectors and 
administered by injection of naked DNA using standard protocols. 

30 Polynucleotides of the invention or fragments thereof in suitable 
vectors for the expression of the polypeptides of the invention 
may be given by injection, inhalation or by mouth. Suitable 
vectors include M.taovia BCG, M.amegmatis or other mycobacteria, 
CoryneJbacteria, Salmonella or other agents according to 

35 established protocols. 



PCT/GB96/03221 

WO 97/23624 



■28- 



10 



15 



Polypeptides of the invention or fragments thereof m 
subBtantially isolated form may be used as vaccines by inDection. 
inhalation, oral administration or by transcutaneous application 
according to standard protocols. Adjuvants (such as Iscoms or 
polylactide-coglycolide encapsulation), cytokines such as lL-12 
and other immunomodulators may be used for the selective 
enhancement of the cell mediated or humoral immunological 
responses. Vaccination with polynucleotides and/or polypeptides 
of the invention may be undertaken to increase the susceptibility 
of pathogenic mycobacteria to antimicrobial agents in vivo. 

in instances wherein the polypeptide is correctly configured so 
as to provide the correct epitope, but is too small to be 
immunogenic, the polypeptide may be linked to a suitable carrier. 

A number of techniques for obtaining such linkage are known in 
the art, including the formation of disulfide linkages using N- 
8uccinimidyl-3-(2-pyridylthio) propionate (SPDP) and succinimidyl 
4.(N-maleimido-methyl)cyclohexane-l-carboxylate (SMCC) obtained 
from Pierce Company, Rockford, Illinois, (if the peptide lacks 
a sulfhydryl group, this can be provided by addition of a 
cysteine residue) . These reagents create a disulfide linkage 
between themselves and peptide cysteine residues on one protein 
and an amide linkage through the epsilon-amino on a lysine, or 
other free amino group in the other. A variety of such 
disulfide/amide-forming agents are known. See. for exan«,le. 
IjjjnsutLBfiv (1982) 62:185. Other bifunctional coupling agents form 
a thioether rather than a disulfide linkage. Many of these thio- 
ether- forming agents are commercially available and include 
reactive esters of e-maleimidocaproic acid. 2-bromoacetic acid. 
2-iodoacetic acid. 4- (N-maleimido-methyl) cyclohexane-l-carboxylic 
, acid, and the like. The carboxyl group can be activated by 
combining them with succinimide or i-hydroxyl- 2 -nitro-4- sulfonic 
acid sodium salt. Additional methods of coupling antigens 
employs the rotavirus/ "binding peptide" system described in EPO 
pub NO. 259.149. the disclosure of which is incorporated herein 
5 by reference. The foregoing list is not meant to be exhaustive, 
and modifications of the named compounds can clearly be used. 



WO 97723624 



.29. 



Any carrier may be used which does not itself induce the 
production of antibodies harmful to the host. Suitable carriers 
are typically large, slowly metabolized raacromolecules such as 
proteins; polysaccharides. such as latex functionalized 

5 Sepharose«, agarose, cellulose, cellulose beads and the like; 
polymeric amino acids, such as polyglutamic acid, polylysine, 
polylactide-coglycolide and the like; amino acid copolymers; and 
inactive virus particles. Especially useful protein substrates 
are serum albumins, keyhole limpet hemocyanin, immunoglobulin 

10 molecules, thyroglobulin, ovalbumin, tetanus toxoid, and other 
proteins well known to those skilled in the art. 

The immunogenicity of the epitopes may also be enhanced by 
preparing them in mammalian or yeast sys terns fused with or 
assembled with particle -forming proteins such as, for example. 

16 that associated with hepatitis B surface antigen. See, e.g.. US- 
A-4.722.B40. Constructs wherein the epitope is linked directly 
to the particle -forming protein coding sequences produce hybrids 
which are immunogenic with respect to the epitope. In addition, 
all of the vectors prepared include epitopes specific to HBV. 

20 having various degrees of immunogenicity. such as, for example, 
the pre-S peptide. 

In addition, portions of the particle- forming protein coding 
sequence may be replaced with codons encoding an epitope of the 
invention. In this replacement, regions which are not required 
25 to mediate the aggregation of the units to form immunogenic 
particles in yeast or mammals can be deleted, thus eliminating 
additional HBV antigenic sites from competition with the epitope 
of the invention. 

Vaccines may be prepared from one or more immunogenic 
3D polypeptides of the invention. These polypeptides may be 
expressed in various host cells (e.g.. bacteria, yeast, insect, 
or mammalian cells) . or alternatively may be isolated from viral 
preparations or made synthetically. 



In addition to the above, it is also possible to prepare li 
35 vaccines of attenuated microorganisms which express one or mo 
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recontoinant polypeptides of the invention. Suitable attenuated 
microorganiBme are known in the art and include, for example, 
viruses (e.g., vaccinia virus), as well as bacteria. 

The preparation of vaccines which contain an immunogenic 
polypeptide (s) as active ingredients, is known to one skilled in 
the art. Typically, such vaccines are prepared as injectablee, 
or as suitably encapsulated oral preparations and either liquid 
solutions or suspensions; solid forms suitable for solution in, 
or suspension in, liquid prior to ingestion or injection may also 
be prepared. The preparation may also be emulsified, or the 
protein encapsulated in liposomes. The active immunogenic 
ingredients are often mixed with excipients which are 
pharmaceutically acceptable and compatible with the active 
ingredient. Suitable excipients are, for example, water, saline, 
dextrose, glycerol, ethanol, or the like and combinations 
thereof. In addition, if desired, the vaccine may contain minor 
amounts of auxiliary substances such as wetting or emulsifying 
agents, pH buffering agents, and/or adjuvants which enhance the 
effectiveness of the vaccine. Examples of adjuvants which may 
be effective include but are not limited to: aluminum hydroxide, 
N-acetyl-muramyl-L-threonyl-D-isoglutamine (thr-MDP) . N-acetyl- 
nor-muramyl-L-alanyl-D-isoglutamine (CGP 11637, referred to as 
nor-MDP) . N-acetylmuramyl-L-alanyl-D-isoglutaminyl-L-alanine-2- 
(1' -2' -dipalmitoyl-sn-glycero-3-hydroxyphosphoryloxy) -ethylamine 
(CGP iseSSA. referred to as MTP-PE) . and RIBI. which contains 
three components extracted from bacteria, monophosphoryl lipid 
A trehalose dimycolate and cell wall skeleton (MPL+TDM+CMS) in 
a' 2% squalene/Tween® BO emulsion. The effectiveness of an 
adjuvant may be determined by measuring the amount of antibodies 
directed against an immunogenic polypeptide containing an 
antigenic sequence resulting from administration of this 
polypeptide in vaccines which are also comprised of the various 
adjuvants . 

The vaccines are conventionally administered parenterally. by 
35 injection, for exan^^le. either subcutaneously or intramuscularly. 
Additional formulations which are suitable for other modes of 
administration include suppositories, oral formulations or as 



20 
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enemas. For suppositories, traditional binders and carriers raay 
include, for example, polyalkylene glycols or triglycerides; such 
suppositories may be formed from mixtures containing the active 
ingredient in the range of 0.5* to 10%. preferably 1% - 2%. Oral 

5 formulations include such normally employed excipients as. for 
example, pharmaceutical grades of mannitol. lactose, starch, 
magnesium stearate, sodium saccharine, cellulose, magnesium 
carbonate, and the like. These compositions take the form of 
solutions, suspensions, tablets, pills, capsules, sustained 

10 release formulations or powders and contain 10* - 95% of active 
ingredient, preferably 25% - 70%. 

The proteins may be formulated into the vaccine as neutral or 
salt forms. Pharmaceutical ly acceptable salts include the acid 
addition salts (formed with free amino groups of the peptide) and 

15 which are formed with inorganic acids such as, for example, 
hydrochloric or phosphoric acids, or such organic acids such as 
acetic, oxalic, tartaric, maleic. and the like. Salts formed 
with the free carboxyl groups may also be derived from inorganic 
bases such as. for example, sodium, potassium, ammonium, calcium. 

20 or ferric hydroxides, and such organic bases as isopropylamine. 
trimethylamine. 2-ethylamino ethanol. histidine, procaine, and 
the like. 

The vaccines are administered in a manner compatible with the 

dosage formulation. and in such amount as will be 
25 prophylacticallyand/or therapeutically effective. The quantity 

to be administered, which is generally in the range of 5Mg to 

250Mg, of antigen per dose, depends on the subject to be treated. 

capacity of the subject's immune system to synthesize antibodies. 

modrof administration and the degree of protection desired. 
30 Precise amounts of active ingredient required to be administered 

may depend on the judgement of the practitioner and may be 

peculiar to each subject. 

The vaccine may be given in a single dose schedule, or preferably 
in a multiple dose schedule. A multiple dose schedule is one in 
35 which a primary course of vaccination may be with 1-10 separate 
doses, followed by other doses given at subsequent time intervals 
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required to maintain and or reenforce the immune response, for 
example, at 1-4 months for a second dose, and if needed, a 
subsequent dosets) after several months. The dosage regimen will 
also, at least in part, be determined by the need of the 
individual and be dependent upon the judgement of the 
practitioner. 

In a further aspect of the invention, there is provided an 
attenuated vaccine comprising a normally pathogenic mycobacteria 
which harbours an attenuating mutation in any one of the genes 
encoding a polypeptide of the invention. The gene is selected 
from the group of ORFs A. B, C. D. E, F, G and H. including the 
homologous ORFs B, C, E and F in Mtb. 

The mycobacteria may be used in the form of killed bacteria or 
as a live attenuated vaccine. There are advantages to a live 
attenuated vaccine. The whole live organism is used, rather than 
dead cells or selected cell components which may exhibit modified 
or denatured antigens. Protein antigens in the outer membrane 
will maintain their tertiary and quaternary structures. 
Therefore the potential to elicit a good protective long term 
immunity should be higher. 

The term "mutation" and the like refers to a genetic lesion in 
a gene which renders the gene non-f unctional . This may be at 
either the level of transcription or translation. The term thus 
envisages deletion of the entire gene or substantial portions 
thereof, and also point mutations in the coding sequence which 
result in truncated gene products unable to carry out the normal 
function of the gene. 

A mutation introduced into a bacterium of the invention will 
generally be a non-reverting attenuating mutation. Non-reverting 
means that for practical purposes the probability of the mutated 
gene being restored to its normal function is small, for example 
less than 1 in 10« such as less than 1 in 10» or even less'than 
1 in 10". 
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An attenuated mycobacteria of the invention may be in isolated 
form. This is usually desirable when the bacterium is to be used 
for the purposes of vaccination. The term "isolated" means that 
the bacterium is in a form in which it can be cultured, processed 

5 or otherwise used in a form in which it can be readily identified 
and in which it is substantially uncontaminated by other 
bacterial strains, for example non-attenuated parent strains or 
unrelated bacterial strains. The term "isolated bacterium- thus 
encompasses cultures of a bacterial mutant of the invention, for 

10 example in the form of colonies on a solid medium or in the form 
of a liquid culture, as well as frozen or dried preparations of 
the strains. 

In a preferred aspect, the attenuated mycobacterium further 
comprises at least one additional mutation. This may be a 

15 mutation in a gene responsible for the production of products 
essential to bacterial growth which are absent in a human or 
animal host. For example, mutations to the gene for aspartate 
semi -aldehyde dehydrogenase (asd) have been proposed for the 
production of attenuated strains of Salmonella. The asd gene is 

20 described further in Gene (1993) 121} 123-128. A lesion in the 
aed gene, encoding the enzyme aspartate ^-semialdehyde 
dehydrogenase would render the organism auxotrophic for the 
essential nutrient diaminopelic acid (DAP) . which can be provided 
exogenously during bulk culture of the vaccine strain. Since 

25 this compound is an essential constituent of the cell wall for 
gram-negative and some gram-positive organisms and is absent from 
mammalian or other vertebrate tissues, mutants would undergo 
lysis after about three rounds of division in such tissues. 
Analogous mutations may be made to the attenuated mycobacteria 

30 of the invention. 

In addition or in the alternative, the attenuated mycobacteria 
may carry a recA mutation. The recA mutation knocks out 
homologous recombination - the process which is exploited for the 
construction of the mutations. Once the recA mutation has been 
35 incoiporated the strain will be unable to repair the constructed 
deletion mutations. Such a mutation will provide attenuated 
strains in which the possibility of homologous recombination to 
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with DNA from wild-type strains has been minimized. RecA genes 
have been widely studied in the art and their sequences are 
available. Further modifications may be made for additional 
safety . 

The invention further provides a process for preparing a vaccine 
composition comprising an attenuated bacterium according to the 
invention process comprises (a) inoculating a culture vessel 
containing a nutrient medium suitable for growth of said 
bacterium; (b) culturing said bacterium; (c) recovering saxd 
bacteria and (d) mixing said bacteria with a pharmaceutically 
acceptable diluent or carrier. 

Attenuated bacterial strains according to the invention may be 
constructed using recombinant DNA methodology which is known per 
se in general, bacterial genes may be mutated by a process of 
targeted homologous recombination in which a DNA construct 
containing a mutated form of the gene is introduced into a host 
bacterium which it is desired to attenuate. The construct wxll 
recombine with the wild- type gene carried by the host and thus 
the mutated gene may be incorporated into the host genome to 
provide a bacterium of the present invention which may then be 
isolated. 

The mutated gene may be obtained by introducing deletions into 
the gene, e.g by digesting with a restriction enzyme which cuts 
the coding sequence twice to excise a portion of the gene and 
then religating under conditions in which the excised portion is 
not reintroduced into the cut gene. Alternatively frame shift 
mutations may be introduced by cutting with a restriction enzyme 
which leaves overhanging 5' and 3' termini, filling in and/or 
trimming back the overhangs, and religating. Similar mutations 
n»ay be made by site directed mutagenesis. These are only 
examples of the types of techniques whi^h will readily be at the 
disposal of those of skill in the art. 

Various assays are available to detect successful recombination, 
in the case of attenuations which mutate a target gene necessary 
for the production of an essential metabolite or catabolite 
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compound. selection may be carried out by screening for bacteria 
unable to grow in the absence of such a compound. Bacteria may 
also be screened with antibodies or nucleic acids of the 
invention to determine the absence of production of a mutated 
gene product of the invention or to confirm that the genetic 
lesion introduced - e.g. a deletion - has been incorporated into 
the genome of the attenuated strain. 

The concentration of the attenuated strain in the vaccine will 
be formulated to allow convenient unit dosage forms to be 
prepared. Concentrations of from about 10* to 10» bacteria per 
ml will generally be suitable, e.g. from about 10* to 10« such as 
about 10* per ml . Live attenuated organisms may be administered 
subcutaneously or intramuscularly at up to 10» organisms in one 
or more doses, e.g from around 10^ to 10«. e.g about 10« or 10' 
organisms in a single dose. 

The vaccines of the invention may be administered to recipients 
to treat established disease or in order to protect them against 
diseases caused by the corresponding wild type mycobacteria, such 
as inflammatory diseases such as Crohn's disease or sarcoidosis 
in humans or Johne's disease in animals. The vaccine may be 
administered by any suitable route. In general, subcutaneous or 
intramuscular injection is most convenient, but oral, intranasal 
and colorectal administration may also be used. 

The following Examples illustrates aspects of the invention. 



EXAMPLB 1 



Tests for the presence of the GS identifier sequence were 
performed on B/xl bacterial DNA extracts (25 iigMl to 500 /xg/ml) 
using polymerase chain reaction based on the oligonucleotide 
primers 5' -GATGCCGTGAGGAGGTAAAGCTGC-3' (Seq ID No. 40) and 5'- 
GATACGGCTCTTGAATCCTGCACG-3' (Seq ID No. 41) from within the 
identifier DNA sequences (Seq. ID Nos 1 and 2) . PCR was performed 
for 40 cycles in the presence of 1.5 mM magnesium and an 
annealing temperature of 58°C. The presence or absence of the 
correct amplification product indicated the presence or absence 
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of GS identifier sequence in the corresponding bacterium. GS 
identifier sequence is shown to be present in all the laboratory 
and field strains of A^Jtb and Have tested. This includes l^tb 
isolates 0025 (bovine OIL Weybridge) , 0021 (caprine, Moredun) , 
0022 (bovine, ^4oredun) , 0139 (human, Chiodiixi 1984). 0209, 
0208, 0211, 0210, 0212, 02O7, 0204, 0206 (bovine, Whipple 1990). 
All Mptb strains were 1S900 positive. The Mavs strains include 
0010 and 0012 (woodpigeon, Thorel) 0018 (armadillo, Portaele) and 
0034, 0037, 0038, 0040 (AIDS, Hof fner) . All MavB strains were 
1S902 positive. One pathogenic M.avium strain 0033 (AIDS, 
Hof fner) also contained GS identifier sequence. <3S identifier 
sequence is absent from other mycobacteria including other 
M. avium, M.malmoense, M.BZulgai, M.gordonae, M.cbelonei, 
M.fortuitum, M.phlei, as well as E.coli, S.areus. Nocardia ep. 
Streptococcus sp. Shigella sp. Pseudomonas sp. 

To obtain the full sequence of GS in Mavs and Wjptb we generated 
a genomic library of Mavs using the restriction endonuclease 
EcoRI and cloning into the vector pDC18. This achieved a 
representative library which was screened with "P-labelled 
identifier sequence yielding a positive clone containing a 17kbp 
insert. We constructed a restriction map of this insert and 
identified GS as fragments vmique to Wavs and Mptb and not 
occurring in laboratory strains of M. avium. These fragments 
were sub-cloned into pUC18 and pGEM4Z. We identified GS 
contained within an 8kb region. The full nucleotide sequence 
was determined for GS on both DNA strands using primer walking 
and automated DNA sequencing. DNA sequence for GS in Mptb was 
obtained using overlapping PCR products generated using PwoDNA 
polymerase, a proofreading thermostable enzyme. The final DNA 
sequences were derived using the University of Wisconsin GCG gel 
assembly software package. .'■ 

Example 3 ; 

The DNA sequence of GS in Mavs and Mptb was found to be more 
than 99% homologous. The ORFs encoded in GS were identified 
using GeneRunner and DNAStar computer programmes. Eight ORFs 
were identified and designated GSA, GSB, GSC, GSD, GSE, GSF, <3SG 



^ PCT/GB96/03221 
WO 97/23624 



37 



10 



15 



20 



25 



30 



35 



and GSH. Database comparisons were carried out against the 
GenEMBL Database release version 48.0 (9/96) . using the BLAST and 
BLIXEM programmes. GSA and GSB encoded proteins of 13.5kDa and 
30.7kDa respectively, both of unknown functions. GSC encoded 
a protein of 38.4kLa with a 65% homology to the amino acid 
sequence of rfbD of V.cholerae, a 62% amino acid sequence 
homology to gmd of E.coli and a 58% homology to gca of 
PS. aeruginosa which are all GDP-D-mannose dehydratases. 
Equivalent gene products in H.inf Juenzae, S.dyeenteriae. 
Y.enterocolitica, N . gonorrhoea , K.pnevmoniae and rfbD in 
salmonella enterica are all involved in 'O' -antigen processing 
known to be linked to pathogenicity. GSD encoded a protein of 
37.lkDa which showed 58% homology at the DNA level to wcaG from 
E.coli, a gene involved in the synthesis and regulation of 
capsular polysaccharides, also related to pathogenicity. GSE 
was found to have a > 30% amino acid homology to rfbr of 
V.cholerae. involved in the transport of specific LPS components 
across the cell membrane. In V.cholerae the gene product causes 
a seroconversion from the Inaba to the Ogawa 'epidemic' strain. 

GSF encoded a protein of 30.2kDa which was homologous in the 
range 25-40% at the amino acid level to several glucosyl 
transferases such as rfpA of K. pneumoniae, rfbB of K.pneumoniae, 
IgtD of H.influenzae. lax of W.gonorrhoae. In E.coli an 
equivalent gene gal B adds ^-1-3 N-acetylglucosamine to galactose, 
the latter only found in 'O' and 'M' antigens which are also 
related to pathogenicity. GSH comprising the ORFs GSH, and GSH, 
encodes a protein totalling about 60kDa which is a putative 
transposase with a 40 - 43% homology at the amino acid level to 
the equivalent gene product of IS21 in B.coJi. This family of 
insertion sequences is broadly distributed amongst gram negative 
bacteria and is responsible for mobility and transposition of 
genetic elements. An IS2I- like element in B.fragilis is split 
either side of the ^-lactamase gene controlling its activation 
and expression. We programmed an E.coli S30 cell-free extract 
with plasmid DNA containing the ORF GSH under the control of a 
lac promoter in the presence of a »S-methionine, and 
demonstrated the translation of an abundant 60kDa protein. 
The proteins homologous to GS encoded in other organisms are in 
general highly antigeni<:. Thus the proteins encoded by the ORFs 
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in GS may be used in inununoassays of antibody or cell mediated 
immune -reactivity for diagnosing infections caused by 
mycobacteria, particularly Mptb, «avB and Mtb. Enhancement of 
host immune recognition of GS encoded proteins by vaccination 
using naked specific DNA or recombinant GS proteins, may be used 
in the prevention and treatment of infections caused by Mptb, 
MavB and Mtb in humans and animals. Mutation or deletion of all 
or some of the ORFs A to H in GS may be used to generate 
attenuated strains of Mptb, Wave or Aftb with lower pathogenicity 
for use as living or killed vaccines in humans and animals. Such 
vaccines are particularly relevant to Johne's disease in animals, 
to diseases caused by Wptb in humans such as Crohn's disease, and 
to the management of tuberculosis especially where the disease 
is caused by multiple drug- resistant organisms. 



wo 97/23634 



PCryGB96/032Zl 



-39- 

SEQUENCE LISTING 

Seq. ID No.l 

5-- 1 WTCCAAQA AACCCGATGG AACCCCGCGC AAACTATTGG ACGTCTCCGC GCTACGCAGT 
61 TGGGTTGGCG CCCGCGAATC GCAaGAAAG AGGGCATCGA TGCAACGGTG TC6TGGTACC 
121 GCACAAATGC CGATGCCGTG AGGAGGTAAA GCTSCGGGCC GGCCGATGTT ATCCaCCGG 
181 CCGGACGGGT AGGGCGACCT GCCA7CGAGT GGTACGGCAG TKCCTCGCC GGOJAQGGGC 
241 ATGGCCTATG TGAGTATCCC ATAGCCTGGC nGGaCGCC CCTACGCAn ATCAGnGAC 
301 CGCmCGCG CCACGTCGCA GGCTTGCGGC AGCATCCCGT TCAGGTCTCC TCATGGTCCG 
361 GTGTGGCAC6 ACCACGCAAG CTCGAACCGA CTCGinCCC AATHCGCAT GCTAATATCG 
421 CTCGATGGAT TTTTTGCGCA ACGCCGGC7T GATGGCTCGT AACGHAGCA CCGAGATGCT 
481 GCGCCACTCC GAACGAAAGC GCCTAnAGT AAACCAAGTC GAAGCATACG GAGTCAACGT 
541 TGTTATTGAT GTCGGTGCTA ACTCCGQCCA GHCGGTAGC GCTHGCGTC GTGCAGGATT 
601 CAAGAGCCGT ATCGTHCCT TTGAACCTCT HCGGGGCCA TTTGCGCAAC TAACGCGCAA 
661 6TCGGCATCG GATC -3' 



15 Seq. ID No. 2 



20 



25 



5'- 1 OATCCOATCC CGACTTGCOC GTTftSTTGCG CWATQGCCC «5AAA0»tWT TCAAAOSAM 
61 CGATACGGCT CrTOAATCCT GCAOGXOGCA AAGCGCTACC GAACTGGCOG GAGTTAGCAC 
121 CQACATCAAT AACAACGTTG ACTCCGTATC CTrCGACTTG GTITACTAAT A GGCCCT rTC 
IBl GTTCOGAGTO GCGCAGCATC TCGGTGCTAA CCTTACGAGC CATCAAGCCO GCGTTGCGCA 
241 AAAAATCCAT CGAGCGATAT TAOCATGCGA AATTGOOAAA CXSAGTCCGTr CGAGCTTGOG 
30X TGGTCCroCC ACACCOGACC ATGAGGACAC CTGAAOGOGA TSCTOCOOCA AGCCTGCQAC 
361 GTGGCGCGAA AGCGGTCAAC TGATAATGCG TAOGOGCGAG CCAAGCCAGG CTATOGGATA 
421 CTCACATAGG CCATGCGCCT CGCCGGCCAG GOGACTCCCG TACCACTCGA TGGCAOGTCG 
481 CCCTACCCGT CCGGCCGGAG GGATAACATC GGCCOOCCCG CAGCTTTACC TCCTCACGGC 
541 ATCGGCATTT GTGCGGTACC ACSACACCGT TGCATCGATO CCCTCTTTCA GTGCGATTCO 
601 CGGGCGCCAA CCCAACTCCG TAOCOCGGAG ACGTCCAATA GTTTOCGCGG GGTTCCATCG 
661 GGTTTAGTrG GATC -3' 
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35 



40 



45 



50 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
€51 
701 
751 
801 
651 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 



GAXTTCIXJOO TTOGAGAOGA COTCOAACTC CTOGTOGGTC TTGCTTOGAA 
TOATCCCTCT GATCTGGTCG GCGGTGCCGA CWmACOGT CGACTrCTCG 
ACGATCACCT TCTACCGGTC GATGTATCAC CCAATGTCGT COGCAACOGA 
GAAGACGTAC GTCAGGTCCG CCGCCCOGCT TTCACCCATC GGCGTCXWGA 
OOOCGATXJAA AATQACGTCC OCOTOCTCGA TTCOGCGTrO CCCCT006TG 
GTGAAGTCAA TCAGCCCCTT CTCACGGTTC CTCOCAATCA ACTCCCAACC 
CGGGCTOGAA AATCGGGACA CTGCCTGCGA GGAOCAAATC GATCTTGGCC 
TGAT06ATAT CGACACAGAC GACATCGTTG COOCTATCCX; CGAGACAOOC 
GCCCGTOACO AGGCCIACAT AOOCTGATCC GACCACOGAA ATITTCAAGA 
TGACCCCTTC AAGTCCCCOA TCO0TC30ACG ACCATACXGC C^CAACTCTG 
TACCCTCCGT GGGTAATTCG CATGTC3GCGT TCGTAAGGAG CAGCCAGOOA 
CTCGGOGACG TTCGGTCAGA OACTCCCA6G ACTACGAGGT TGCCGGTGCG 
ATACATCACA GTGTTCCGTC TGTCGGCAAC GATGCAOCAA GAACCCAOGG 
GGCAGCCCTG AACPGCGOGC ATGACCOGTC CTTCTCCTGG CACCTTTGAT 
CGGCCACCGC TTCCATGCGA ACATGACCGG AATCCATAGC GCGTGGTCAA 
GCAGCCGGGA GGTAGACGTC CGTCTCATCT GCTCCAACCX5 TCTCXXHOAT 
AACGATTTCG CTGAACGATC TCGAGGGATT GAAAACCACC GTCGAGAGCG 
TTCOCGCOCA GCQCTATOGG GGGCGAATCG AGCACaTOGT CATCCACCCT 
GGATCGGGCXS ACGCXXSTCGT OGAOTATCTQ TCCGGOGATC CTGCCTXTGC 
ATATTOGCAA TCTCAGCCCG ACAACOGGAC ATATGACGCG ATGAATCAGO 
GCATTCCCCA TTCGTCCGGC GACCTGTrGT GGTTTATGCA CTCCACGOAT 
CGTTTCTCCG ATCCAGATGC AGTCGCTTCC CTGGTGGAOG CG CTCTCX3GG 
GCATOQACCA 0TACX5TGATT TGTGOGGTTA CGGGAAAAAC AAccA ioiCG 
GACTC3GACGG CAAACCACTT TTCCCTCGGC CGTACGGCTA TATGCCCTTT 
AAGATCCOGA AATTTCTGCT CGGCGCGACG 6TT6CGCATC AQGCG ACATT 
CTTCGGCGCG TCCCTGGTAG CCAAGTTGGG CGGTTACGAT CrTGATTTTG 
GACTCGAGGC CGACCACCTG TTCATCTACC CTGCCGCACT AATACGGCCT 
CCCGTCAC3GA TCGACCGCGT GGTTTGOGAC TTCGATCTCA CGGGACCTOG 
TTCAACCCAG CCCATCCGTG AOCACTATCQ GACCCTGCGG CGGCTCTOGG 
ACXTOCATOG CGACTACCCG CTGGGTGGGC GCACSAGTGTC GTOOGCTTAC 
TroCGTGTGA AGGAGTACTT GATTCGGGCC GACCTGGCCG CATTCAACGC 
GGTAAAGTTC TTGCJGAGCGA AGTTCGCCAG AGCTTCGCGC AAGCAAAATT 
CATAGAAACC AACTTCTACT OCCTGACCTG AGCAGCGCCG AGGCX5CGCAC 
CGCGATCAGT GCGACCTGAA CGCCCAOGTG GAAAOOGCCA CCGATCCCCC 
CAC0GAGT6C CTGACGCTTC GGATCCCTTG CACCACAACG AGAGTQAGAO 
CXSCCATCATC AGGAAATATC CGCTGGGCCG AGTCAACGCC GGAGTGACAA 
AAOTOAOAAC COGGTOAAGC GAGCGCTTAT AACAOOGATC ACGGOOCACW 
ATGGTTCCTA CCTCGCCGAO CTACTACTGA GCAAGG6ATA rGAGOTTCAC 
GGGCTCOTTC CTCGAGCTrC GACGTTTAAC AOGTCGCGGA TCOATCACCT 
CTACGTXGAC CCACACCAAC CGGGCGCGC6 CrrGTTCTTG CACTATGCAG 
ACCTCACTGA CXX3CACCCGG TTGGTGACCC TGCTCAGCAG TATO GACCO G 
GATOAGGTCT ACAACCTCGC AGOGCACTCC CATGTCCCCG TCACCTTICA 
CGAGCCAGTC CATACCGGAG ACACCACCGG CATGOGATCG ATCGGACTTC 
TCGAAGCAGT CCGCCTTTCT CGGGTGGACT GCCGCTTCTA TCAGQCTTCC 
TCCTCGGAGA TXTTTCGGCXSC ATCTCOGCCA CCGCAGAACG AATCGAOGCC 
GTTCTATCCC OSTTCGCCAT ACGGCGOGGC CAACCTCTTC TCGTACTOGA 
CGACTCGCAA CTATCGAOAO OCGTACGGAT TATTOGCAGT GAATGCCATC 

TixrrrcAACC atcagtcccc coggcgcggc gagactttcg tqacccgaaa 

GATCACCCGT GCGGTGGGGC GCATCXX3AGC TQGOSTCCAA TOGOAOOTCT 
ATATGGGCAA <rrCGATGa3 ATCCGCXyiCT GGOGCTACGC <XrCCGAATAT 
GTCGAOOGGA TGTCGAGGAT GTTGCAAGCG CCTGAACCTG ATGACTAOGT 



wo 97/23624 



PCT/GB96/03221 



^41 • 





2551 


CCrGGTOACA 




ACACCGTilOG 




2601 


TTGACCATuT 




TGGCAAAAGC 




2651 


TATTTGCGTC 


WUAUWvnuui 


PflATTCOCTA 




2701 


GGCCCA6TCA 




AvVv • * WW * 


5 


2751 


GCATCATGGT 








2B01 


TGGATCOACA 


OGrcGATGTT 






2651 


ACACCTGG6C 


CTCTQGACCO 






2901 


GGGGCTOGTC 


GGCTI-AuUuU 






2951 


CCAATCTCAT 






10 


3001 


GCAACGTTTG 


ATTTTGTGTC 


TV2&rS&f^AACA 




3051 


GGCCX3CACGG 


GTOGGCGGCA 






3101 


TCTTGTCCGA 


AAACCTwCGA 






3151 


GCCGTGOCiTG 


TGCCQCGGCT 


LX-i i i J wUAU 




3201 


GAAGTAOGCT 


CCGCAACCTA 




IS 


3251 


TOOAGCCCAC 


CAAC6ACGCG 


TATbUaA.AV.Tj 




3301 


CAA6TTCAGG 


CGGTTAOGCG 


CCAATATtJUS* 




3351 


GCCGACTAAC 


CTCTACGGAC 


COGGCGACAA 




3401 


ATCTCTTGCC 


GGOGCrCATC 


MM>l>^r*llfl^1l4|IM 

CGTOQATATG 




3451 


GCAGAAGAGG 


TGACGAATTG 


MMMMfc MM#VMIt< 

GGGGACOGGT 


20 


3501 


GCATGTCGAC 


GATCTGGCGA 


GCGCATGCCT 




3551 


ATGGTCCGAA 


CCAC6TCAAC 


GTG0GCACC6 




3601 


GAGATOGCAG 


ACATGGTCGC 


TACAGCOGTG 




3651 


TTGGGATCCA 


ACTAAACCCX; 


ATGGAACCCC 




3701 


CCOCGCTAOO 


CGAGTTGGGT 


IXSGCXrCCOGC 


25 


3751 


ATCX3AT6CAA 


CGOrOTCGTG 


GTACCGCACA 




3801 


GTAAAGCTGC 


GGGTCGGCC6 


ATG7TATCCC 




3651 


GACCTGCCGT 


CGAGTGGTAC 


GGCAGTOCSCC 




3901 


CTATGOGAGT 


ATCCAATAGC 


CItSGCTTGGC 




3951 


TTGACCGCrr 


TCGOGCCAGC 


TCGCAGGwXT 


30 


4O01 


TCTCCTCATG 


GTCOGGTGTG 


MMk MM * ^>M< ^ 

GCAOQACCAw 




4051 


TTCCCAATTT 


OGCATGCTAA 


TATCGCTCGA 




4101 


CGCTTGATSG 


CrCGTAAOGT 


»m M»t»J> 

TACTTACCQAB 




4151 


AAAG06CCTA 


TTAGTAAACC 


AATTUwUUsU 




4201 


TTGATGTOGG 


TGCTAACTCC 




35 


4251 


GGATTCAAGA 


GCCGTATCGT 


TTCCTTTGAA 




4301 


GCAACTAAOG 


OGCAAGTUM 






4351 


ATGCCCTAGG 


OGACSCCvaAl 






4401 


GOGGOGGCAA 


GTAGTTCOGT 






4451 


CTTTCCTCCC 


GCGAATTATA 




40 


4501 


TTQATT0C3GT 


TGCATCAuAA 


^^^^P^^^^ft M ^^^^ 




4551 


AAGAT0GAC6 


TACAGGGTTT 






4601 


AAO0C1TAAC 


GAAAOCTGCG 


TCGuL-ATl>LA 




4651 


CGTTGTAOQA 


K — jMMm ■ Mk mM 

AGGTGACATG 


CTGATTCATG 




4701 


TCCCTAOOTT 


TCAGACTQAC 


GOuii I'J'O 


45 


4751 


CAATGGTCGA 


ATGCTTCAAG 


CIuACOGuAT 




4801 


GACATAAATG 


CTCCGTCGGC 


ACCCTGCCGQ 




4851 


TGAGCCTGGCC 


TCCXXWOCAC 


CTAATOGACT 




4901 


GACOTGOGGC 


AC3GAACA6GT 


GGCCGGCTGC 




4951 


CT6CX3CCAGT 


GTTCTCGATA 


ATTATCCCTA 


SO 


5001 


CTGCAAGCCT 


GCCTCGGAAG 


CAT0670S0G 




5051 


AGTOCTCCTT 


GTCGACGGCG 


<nTCX»COGA 




5101 


ACA6TTTCC6 


CCCGGAACTC 


GGCTOGCGAC 




5151 


GATOATOGCC 


CCTACGAGGC 


CAT6AAC0GC 



TGAGTTCGCT 
G06TCAA6TT 
GTAOGAGATG 
TCATACTOGT 
TGGA6TG06A 
GGCAGAGTAA 
GTGTATATCG 
ATTTGAGGCC 
TTGATCTGAC 
CCACAGGTGA 
TAACACCTAT 
ATTTOCTCGA 
GGTT06TCAT 
TGCTTTATTG 
CCAAGATCGC 
CT6G0QT0GA 
CTTCrCCCCO 
AGGAA6CCAA 
ACTCOCOCGC 

GCGTCGATCA 
GGCTACATCG 
G06CAAACTA 
GAATCGCACT 
AATGC0GATY3 
TCCGOCX?GGA 
TQQCCGGCGA 
TOGCCCCTAC 
GOGGCAGCAT 
GCAAGCrCGA 
TGGATTTTTT 
ATGCT60GCC 
ATACGGAGTC 
GTAGOGCTTT 
CCTCXTTOGG 
ACTATGGGAO 
CCATCAATGT 
CTTAAAACTC 
AGAOGTTGCA 
CTAC09ATGT 
GTTATCAOGG 
ACTOGAACTT 
AAG0GC1TCA 
CCCGGCTTTA 
TTTCTTCGGT 
TATCCAAACG 
ATCTAAATTG 
TAGOGTTACA 
CCTTCAATGC 
CAGACCTACC 
TCGGACCCTC 
TC3GTCGTTCA 
GG0G7CGG0G 



CAAOCTGCTT 
TGA0GAC06C 
CC6ACAAGGC 
GAACTOCKXSC 
TG6CACACCA 
GTTGAOOACT 
CCGGTCATOG 
GAOGOGTTCA 
GGACOSAGCC 
TCATCOATGC 
CCOGOGGACT 
CXSCAGCIGTC 
GCATCTACCC 
ACTGOCCCTT 
CGGTATCCTG 
TCTCTGCGAT 
TCCG0GTC3GC 
AGCTGGTGGT 
CCGAACTTCT 
GAACATXTCG 
CAGCATTAGC 
GOGAAACACG 
TTGGAC6TCT 
GAAAGAOOGC 
CCGTQAOOAG 
CGGGTGGGGC 
GGCGOGTOGC 
GCATTATCAG 
CCCGTTCACG 
ACCGACTOGT 
-GCGCAAOGCC 
ACTTCGAACG 
AACGrrCTTA 
GCGTCGTGCA 
GGCCATTTGC 
T6TCA0CAGT 
GGCAGGCAAT 
ATCAAGATGC 
ATACACCX^CC 
TACnrCCTG 
GCAGTAAGTC 
TCTTTTATTC 
ACTTCTCTAT 
CGGATCGGCG 
<}GGGACGATT 
GGCGATCTG3 
AGGCGGCCGC 
CACGTCATGA 
AGCGG7GA0G 
tSGOAAGTGGA 
GACATCXK3GA 
CAGOGGGCCC 
TGGCCACAGG 
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42 



10 



15 



20 



25 



30 



35 



40 



45 



50 



520 

525 

530 

535 

540 

545 

550 

555 

5€0 

565 

570 

575 

580 

565 

590 

595 

60O 

€05 

610 

615 

620 

625 

630 

635 

640 

645 

650 

655 

660 

€65 

670 

675 

660 

665 

690 

695 

700 

705 

710 

715 

720 

725 

730; 

735 

740 

745 

750 

755 

760 

765 

770 

775 

780 



CGAATCGGTA CmTTTTAG GOGCCGAOCyi CACCCTCTAC 
CGTTGGCCCA GGTAGCCCCT TTTCTCGGCG ACCATGCCCC 
GTCTATGGCG AWX'1X>TGAT GOGTrOOACG AAAA0CXX5GC 

rrrcGACCTC gaccgcctcc tatttgagac gaatttotgc 

TCTTTTACCG CCCTGAGCTT TTCGACGGCA TCGOCCCTTA 
TACOOAOTCT GOGOGGACTG GGACTTCAAT ATTCGCTGCT 
GGCGCTGATT ACCCGCTACA TGGAOGTOGT GATTTCOGAA 
TGACCGGCTT CAGCATGAGG CAGGGGACTG ATAAAGAGTT 
CTGCCAATGT ACTTCTGGCT TGCAGGGTGG GAGACTTSCA 
OOCOTTTTTG AAAGACAAGG ACAATOCCCG TCTGGCCTTG 
TOATAAOGGT TAAOGCCGTC TCCAAAGAAC CAAGCGCAGA 
CGGATCCACA TTG6ACTTCT TTAACGCCTT TOCCTCCTGA 
AACCCaSTTC CGCGTAAOGC GGC3GCGCAGA GAGTGGTOGC 
ACXGfTTCrCXS TGCCACTCCT TGGAAACCGT CGAGCACTCT 
CTTCACGTTC G06CCCX5CTC CIAGAGGTAG CCTGTCAOGT 
AATGAGTGCA ACTOGGCGTC GCCAAAOGTr TCAGTOJCGG 
CACCGCAAGA CTACTGGAGT GCGTGCACAA GOGCCTCCAG 
AAAGCGGATC CAAAGGGATT CGAAGCTTGA GCAACATGCX; 
CGGCCTATGA GGCPGGGACA CGTTTTCGAT CCGOSCCOCA 
AATGGCCAAG TAGAAGTCCC CGCTGGTGGC CAGCAGAACT 
CTGCGGGTGG TTGGCTAATT CTTGOOGGCT OCCTTCTTGT 
GCCCATCCGC TACCACTCGC CGGAGGTGAC GACGATGCTG 
GCAGCCGATC GAGCATGCTG GCGGCXWTGG TGTGCTC30GG 
CCCCATTOTT CX3AAGGGCCA ATGCGAGGCC ATGGCCAGGG 
GTAGCCGGCA GCCACGAGCC GGAACAACAG TTGACTCCCG 
GC3GGGGCGAA GCOGATCTCG TCCAAGATGA CCAGATCCGC 
GTCTCGATCA TCTTGCCGAC GGTGTT6TCG GCCAOGCCGC 
CrCGATCAGG TCGGOOGCGG TGAAGTAGCG GACTTTGAAT 
COGCAOCGTG CCOGCAGCCG AT6AGCAGGT GACmTCCC 
GGGCCAATGA CCOCCAGGTr CTGTTOTXJCC CGAATCCATT 
CAGCTAGTCC AACGTGGCTG CGGTGATCGA CGATCOGGTG 
CGTCGA6GCT CITGGTGACC CGGAAGGCTG CGGCCTTGAG 
GTGTrGOAGG CATCGCGGGC AGCCATCTCG GCCTCAACCA 
GATCTCCTCC GGT6TCCAGC GTTGCGTCTT GCC3GACTTGC 
C3GGCCrr6CG GOGCACCGTG GCCAGCTTCA ACCOCX»CAG 
AGGTCACCAC CCAGOGGTOC CGCCGAGGAC GGTGCCAC05 
GGTGGTCATG AGGCCCTCCC GTOCGTCGTG TTGATCTTGT 
CGAGCGOOTC TCGACGGTCG CCAGATCGAG CACGACTCCXJ 
GGtXKSGGTIXi TGGGGTGCCO GC0CC3GGCGG CCAGGATCGA 
GCAGCX5CGGA ACOOGCGAAA CGCAACCGCC 03G0SCA0CG 
AGCCTCTTCXS CC3GTGGCCGC CGCCAAGGCC GAGCAGAATG 
ATTTCAGTCG GGTGTTGCCG ATCCCAGCAG CACOCACX3AG 
GCTTOGGTTC CCAATGCGCA GAATOGTTrC TCTGCTTGCG 
AGGACCACGC GAGOGTOCOG GTCTOOGTCC GTCGTAGTGT 
TGGACACCTC ACCTGGOCItj AOGAGCTCGT 6CTC00CCAC 
GTCGCAGGTT CCAACAGGAT CAGGCCGCCA TGATOGACCA 
GOTOGCACCG ACGAGCCGCT CAGGCACCGA GTAAC6ACCT 
GGATGCACGA GAGOCCGTCG ACCTTAOQCC GCACC6ACCC 
CTCGGCCGCA GGGAGGGCAG CTCCCTCAAG AOGOTGCGCT 
GCGATCCTTG GOCACGGCGC AGATCTCCGA GTG6ACCGTG 
COGOGCACCA TAGTTGCGCC TCGGCXnTCA CGGCAOGTAG 
TCACCGGCTA ACXJCAGCITC GGTCAGCAGC GGCACCGCAA 
AGCGTAGCCA CAGAOOTTCT CCAGGATGCC CTTCX»TTGC 



GAACCAACCA 
AAGCCATCTT 
ATGCOGGACC 
CACCAATOGA 
CAACCTGOGC 

TcrccAAcrc 

TACAACGACA 

CAGAAAAOGG 

GGCGCATGCT 

0GTA0GGG6T 

ACCXTTAGTCG 

TCCACCTTTC 

ATATOGCATC 

GGTTCGOGTT 

GACTGAAGCC 

TTGAGCAAGA 

CTCGOGGCTG 

AAGGGGAGAA 

ATGCACTGTC 

CCCCACTCOC 

GGTCGGCGTG 

GCGTGGTGCA 

CAGGAATCGC 

AGOGGCGCrC 

GTGTCGTOGA 

GCGGAGCAGG 

GGTAGAGGAC 

CC3GGCGTGGA 

CG7ACCAOGT 

CCAOGCTOGA 

ACGTCGAACC 

ACGGTTGGOO 

AOGTCCCCAG 

AACACCnSCG 

C0CC3GCGTCA 

GCZTGGCAGC 

AGGCerCCAA 

TCGCCGOCGG 

GCGCAGGTOG 

CGTCAATCAA 

TOGAOTTOQO 

GAACTGCTGC 

TT7TCGGG0G 

TCATCGAGGA 

GATCACACOG 

CCACCGCCAC 

GAGCCGTAAC 

CGAGCCGATC 

CG7CAACCAA 

GCATTGACCT 

GTCGACCT6C 

GGTOGTCCTG 

GGATCCGCAC 
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7851 
7901 



-43 



COTOGCAGfcA GTCCGGJIACG AAGCCATAGT GGGAOCOBAA TCGCACATAA 
TCCOGTSTIG GWkCAACSkAC ATTGGOGACG ACAOCACCTT TOAOGCRCCC 



f7W* — 

7951 CJffCCOGTCG GCCAGGATCT TOGCCQGAAC CCCACC6ATC CCCTC 



10 



16 



20 



25 



Seq. ID Ho. 4 

1 TTCTACroCC TGACCTCAGC AGCGCCGAGG CGC6CXGCGC <}ATC*CTGOG ACCTGAATGG 
61 CCAQGTGGAA AGOGCCACOG ATGCCGSCAC CGAGTGCCTG ACGATTOGGA TCCCTTOCAC 
121 CACAACOAQA GTOAGACCGC CATGATGACG AAATATCCGC TGGGC5GGAGI CAACGCCXSGA 
IBl GTGACAAAAG TGAOAACCOO GTGAAGCGAG CGCTTATAAC AOOGAICAOG GGGCAGGATG 
241 OTTCCTACCT CCCCX5A0CTA CTACTOAGCA AGOGATACGA GGTTCACGGO CTCGTTCGTC 
301 GAGCTTCGAC CTTTAACAOS TCGOSSATCO ATCACCTCTA CGTTQACCCA CACCAACOQG 
361 GCGCGCGCTT GTTCTTGOIC TATGCAGACC TCACTGACGG CACCCGGTTG GTGACCCTGC 
421 TCAGCAGTAT CGACCCGGAT GASGTCTACA ACCICGCAGC <5C*GTCCCAT GTGCOCOICJk 
4B1 GCmXiACGA 6CCAGTGCAT ACCGGAGACA CCACCGGCAT GGGATCGATC CGACTTCTOG 
541 AAGCAGTCCG CCTTTCTCGG GTGGACTGCC GGTTCTATCA iSGClTCCTCQ TCGGAGATGT 
601 TCG6CGCATC TCCGCCACCG CAGAACGAAT CGACGCCGTT CTATCCCC3GT TCGCCATACG 
661 GC6C6GCCAA GGTCTTCTCG TACTGGAOSA CTCGCAACTA TOGAGAGGCO TAGGGATTAT 
721 TCGCAGTGAA TGGCATCTTG TTCAACCATG AGTCCCCCCG GCGCOGCGAG ACTTTC3GTGA 
781 CCCGAAAGAT CACGCGTGCC GTGGC6CXSCA TCCOAGCTGG CGTCCAATCO CAGGTCTATA 
B41 TGGGCAACCT CGATGCGATC CGOGACTGGG GCTACGCGCC CGAATATGTC 6AG0GGATGT 
901 GGAGGATQTT GCAAGCGCCT GAACCTGATO ACTACOTCCT GGCGACAGOG CGTCGITACA 
961 CCGIACGTCA GTTCGCTCAA GCTGCnTO ACCACCTCGO GCTCOACTCG CAAAAGCAOO 

1021 TCAAGrrroA cgaccgctat ttgcgcccca ccgagotcga ttccctacta cgagatocco 

lOBl ACAGGGCGGC CCAGTCACTC CGCTGGAAAG CrrCGGTTCA TACTGGTGAA CTCGCGCGCA 
1141 TCATCGTGGA CCCOGACATC GCCGCCTCGG AGTGCGATG6 CACAC<»TGG ATCOACACJGC 
1201 CXSATCrreCC -roGTrGGGGC CGAGTAAGTT GACGACTACA CCTOGGCCTC TCGACCGCGC 
1261 AACGCCCGTG TATATCOCCG GTCATOGGGG GCTGGTCCOC TCAOCGCTCG TACGTAGATT 
1321 TGAGOCCGAG GGGITCACCA ATCTCATTGT GCGATCACGC <aTGAGATTG ATCTGACGGA 
1381 CCGAGCCOCA ACGrTTGATT TTGTGTCTGA GACAAGACCA CAOGTOATCA TCXSATGCGGC 
1441 CGCaCGGGTC GGCGGCATCA TGGCGAATAA CACCTATCCC GCG GACTTC T TGTCCGAAAA 
1501 CCTCCGAATC CAGACCAATT TGCTCGACSGC AGCTGTCGCC GTGCGTGTGC CGCGGCTCCT 
1561 TTTCCTCGGT TCGTCATGCA TCTACCCGAA GTAOGCTCCG CAACCTATCC ACGAGAGIGC 
1621 TrrATTOACT GGCCCTrTOG AGCCCACCAA CGAOGC6TAT OCGATtXXX» AGATGGCOGG 
1681 TATCCTGCAA GTTCAGGOGG TTAGGC6CCA ATAT6GGCTG OCGT06ATCI CTCCGATCCC 
1741 CACTAACCTC TACGGACCCO GCGACAACTT CTCCCC6TCC GGGTCGCATC TCTTGCCOGC 
35 1801 GCTCATCCGT CGATATGACG AAGCCAAAGC TGGTGGTOCA GAAGAGCTGA CGAATTGGGG 

1861 GACCGGTACT CCGCGGCGOG AACITCTGCA TGTGGAOGAT CTOGOGAGOS CATCCCTGIT 
1921 CCmTGOAA CATTrcOATO GTCOGAACCA COTCAACOTe GGCACtXKSCG TCGATCACAO 
1981 CATTAGCGAG ATCGCAtaiCA TGGTCGCTAC OGOGGTGGGC TACATCGGCO AAACAOCTTG 
2041 GOATCCAACT AAACCCCATO GAACCCCOCG CAAACTATTG GACCTCTCCG COCTAGGCGA 
40 2101 GTTGGGTTGG CGCCOGCGAA TCSICACTCAA AGftOGGCATC GATGCAACGG TGTOGTGGTA 

2161 CCOCACAAAT OCCGATGCCO TGAOGAGGTA AAGCTGCCOG CCXJGCCGATG TTATCCCTCC 
2221 GGCCGGACGG CTAGGGCXaiC CTOCCATCGA -GTGGTACGGC ACTC6CCPGG CCGGCGAGGC 
2281 CCATCGCCTA TOGGADTATC CCATAGCCTG GCTTGGCTCQ CCCCTACGCA TTATCAGTTO 
2341 ACCGCTTTOG OGCCAGCTCG CAOGCTCGCG CCAGCATCGC GTTCAQGTCT CCTCATGOTC 
45 2401 COGTGTCGCA COACCACGCA AOCTCGAACC CACrCCTTTC CCAATTTGGC ATGCTAATAT 

2461 CGCTCGATQG ATTTTTTGCQ CAACGCCX3GC TTGATGGCTC <rrAAanTAG CACCGAGATG 
2521 CTCCOCCACT TCGAACGAAA GCGCCTATTA GTAAAGCAAT TCAAAGCATA CGCAGTCAAC 
2581 CnCTTArrC ATCTCGGTGC TAACTCGGGC CAGTTCGGTA GCGCTTTOGG TOGTGCAQGA 
2641 TTCAAOAGCC CTATCGTITC CnTGAACCT CrrTCGGOGC CATTTGCOCA ACTAAOGOGC 
50 2701 GAGTCGGCAT CGGATCOkCT ATOGGAGTGT CACCAGTATG C-CCTAGGeGA CGCG6ATGAG 



30 
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2761 ACGKTTACCA TCAATBTGGC AGGCWITGCG GGGOCAJIGTA tflTCCCTGCT GCCBATOCTT 
2821 AXAAGTCATC AAGATGCCTT TCCTCCOGCG WVTTATATTO 6CACCCAAGA COTTCCAATA 
2B81 CACCCCCTTO ATICGOTTOC ATCAOAATTT CTGAACCCTA CCGATGTTAC TTrCCTGAAC 
2941 ATOGAOGTAC AGGGTTTCOA GAAGCAOGTT ATCGOSGflCA OTAABTCAAC GCTTAAOOAA 
5 3001 AOCTOOOTOO OCATCC»ACT CCAACTTTCT TTTATTCOGT TGTACGAAGG TGACATGCTG 

3061 ATTCATCAAG CGCTTGAACT TGTCTATTCC CTAGCTTTCA GACTCAOQGG TTTOTTGCCC 
3121 GGATTTACCO ATCCGCOCAA TGGTCGAATO CTTCAAGCTG ACGOCATTTT CTTCCGTOGG 
3181 GAOSArrGAC ATAAATGCTT GCGTCGGCAC CCTGCCGOTA TCCAAAOOGG CGATCTGGTG 
3241 AGCCGOCCrC CCGGGCACCT AATCGACTAT CTAAATTGAG CCGGCCBCGA CGTGCGGCAC 
10 3301 OAACAOGTGG CCGGCTGCXA GCGTTACACA COTCATGACT OCGCCACTGT TCTCGATAAT 

3361 TATCCCIACC TTCAATGCAG CGGTGACGCT GCAAGCCTGC CTOGOAAOCA TCGTOOGOCA 
3421 GACCTACCGG GAAGTOQAA6 TOGTCCTTGT C0AC06CG6T TCGACOGATC GGACCCTOGA 
34S1 CATCGCGAAC AGTTTCOGCC CGGAACTCGG CTCGCGACTG OTOOTTCACA GCGGOCCOGA 
3541 TGATGGCCCC TACGACGCCA TGAACCGCGG CGTCGOCGTA GCCACAGGCG AATGGOTACT 
15 3601 TTTPrrAGGC GCCGACGACA CCCTCTACGA ACCAACCACG TTOGCCCAGO TAGCCGCTTT 

3661 TCTCGGCGAC CATOCGGCAA GCCATCTTOT CTATGGCGAT GTTGTOATGC G ITCGACG AA 
3721 AAGCCGGCAT GCCGGACCTT TCGACCTCGA CCGCCTCCTA TTTGAGACGA ATTTCTCCCA 
37B1 CCAATCGATC TnTACOGCC GTGAGCTTTT CGACXXSCATC GGCCCTTACA ACCTOCGCTA 
3841 CCGAGTC3XJG GCGGACTCGG ACTTCAATAT TCGCTGCTTC TCCAACCCGG CGCTGATTAC 
20 3901 CCGCTACATG GACGTCGTGA TTTCOGAATA CAACGACATG ACCCCCTTCA GCATCAGGCA 

3961 GOGGACTOAT AAAGAGTTCA GAAAACG6CT GCCAATGTAC TTCTOGGTTG CAGOGTGGGA 
4021 CACrroCABG CGCATOCTCG CXSTTTTTGAA AGACAAOGAG AATCGCCGTC TGGCCTTGCG 
4081 TACX5CGGTTC ATAAGGGTTA AGGCCGTCTC CAAAOAAOGA AGCGCA6AAC CGTAGTCGCG 
4141 GATCCACATT GGACTTCm AACOCGTrTG CGTCCXOATC CACCTTTCAA CCCCGITCCG 
25 4201 CGTGACGCGG CGCGCAGAGA GTGGICGCAT ATOGCGTCAC TGTTCTCGTG CCACTGCTTG 

4261 GAAAGCGTCG AOCACTCIGG TTCOCGrTCT TGACGTTCGC GCCCGCCCCT ACAGGTACCG 
4321 TGTCACGTCA CTCAAOCCAA TGAGTGCAAC TCGGCGTCGC GAAAGGrTTC AGTCGOOGTr 
4381 GAOCAAGACA CCGCAAGACT ACTGGAGTGC GTGCACAAGC GCCTCCAOCT CACOG 



S«q. 10 Mo. 5 

1 atgatcgetg cgatctggtc ggcggtgccg acaggaaccg tcgacttgtc gacgatcacc 

61 ttgtaccggt cgatgcatga cccaatgtcg tccgcaaccg agaagacgta cgccaggccc 

121 gccgccccgc tttcacccat gggcgtcggg acggcgatga aaatgacgtc cgcgtgetcg 

181 actcegegtt gecggtcggt ggtgaagtca atcagcccgc teccacggtt cctcgcaatc 

241 aacteccaac ccgggcccga aaatcgggac actgcctgcg aggageaaat cgatcttggc 
301 ctgatcgata tcgacacaga cgacategtt gccgctatcc gcgagecagg cgcccgtgac 
361 gagacctaca tagcctga 



Seq. ID No. 6 

1 MlAVIWSAVPTGTVDl.STlTI.YRSMyDPMS 
31 SATEKTYVRSAAPLSPMGVGTAMKMTSACS 
61 IPRCRSVVKSISPFSRPLAINSOPGI.EKRD 
91 TACEEQIDLGLIDIDTDDIVAAIRETOARD 

121 B A Y I A 
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Seq. ID No. 7 

X gtgtcBtctg ctccaaccgt gtcggtgata acgatttcgc tgaacgatct rgagggattg 
61 aaaagcaccg tggagagcgt tcgcgcgcag cgctatgggg ggcgaatcga gcacatcgtc 
121 atcgacggtg gatcgggcga cgccgtcgtg gagtatctgt ccggcgatcc tggctttgca 
IBl tattggcaat ctcageccga caacgggaga tatgacgcga tgaatcaggg cattgcccat 
241 tcgtcgggcg acctgttgtg gtttatgcac tccacggatc gtttctccga tccagatgca 
301 gtcgcttccg tggtggaggc gctctcgggg catggaccag tacgtgattt gtggggttac 
361 gggaaaaaca accttgtcgg actcgacggc aaaccacttt tccctcggcc gtacggctat 
421 atgccgctta agatgcggaa atttctgctc ggcgcgacgg ttgcgcatca ggcgacattc 
481 ttcggcgcgt cgctggtagc caagttgggc ggttacgatc tcgattttgg actcgaggcg 
541 gaccegctgt tcatctaceg tgccgcacta atacggcctc ccgtcacgat cgaccgcgtg 
601 gtttgcgact tcgatgtcac gggacctggt tcaacccagc ccatccgtga gcactatcgg 
661 accctgcggc ggctctggga cctgcatggc gactacccgc tgggtgggcg cagagtgtcg 
721 tgggcttact tgcgcgtgaa ggagtacttg attcgggccg acctggccgc atccaacgcg 
781 gtaaagttct tgcgagcgaa gttcgccaga gcttcgcgga agcaaaactc atag 



Seq. ID NO. 8 

1 VfiSAPTVSVITISLKDLEGLKSTVBSVRAQ 



2D 



25 



31 


R 


YOG 


R 


I 


E 


K 


I 


V 


1 


D 


G 


G 


6 G 


D 


A V 


V 


E 


Y 


L 


S 


G 


D 


P 


G 


F 


A 


61 


Y 


HQS 


Q 


P 


D 


N 


G 


R 


Y 


D A N N Q G 


I A 


H 


B 


S 


G 


D 


L 


L 


W 


F 


N 


H 


91 


S 


T B R 


F 


S 


D 


P 


D 


A 


V 


A 


S 


V 


V E 


A 


L S 


G 


H 


6 


P V R D 


L 


W 


G 


Y 


121 


G 


K N N 


L 


V 


G 


L 


D 


6 


X 


P 


L 


F 


P R 


P 


Y G 


y 


H 


P 


F 


K 


H 


R 


K 


P 


L 


h 


151 


0 


A T V 


A 


H 


0 A T 


F 


F 


G 


A 


S 


L V 


A 


K L 


G 


G 


Y 


D 


L 


0 


F 


G 


L 


£ 


A 


181 


D Q L F 


I 


Y 


R 


A 


A 


L 


1 


R 


P 


P 


V T 


I 


D R 


V V c 


D 


F 


D 


V 


T 


G 


P 


G 


211 


S 


TOP 


I 


R 


E 


H 


Y 


R 


T 


L 


R 


R 


L N 


D 


h H 


G 


D 


Y 


P 


L 


6 


6 


R 


R 


V 


S 


241 


W 


A Y L 


R 


V 


K 


E 


Y 


L 


1 


R 


A 


D 


L A 


A 


F W 


A 


V 


K 


F 


h 


R 


A 


K 


F 


A 


R 


271 


A 


S R K Q N 


6 













































Seq. ID NO. 9 





1 


gtgaagcgag 


cgcttataac 




61 


ctactgagca 


agggatacga 


30 


121 


tcgcggatcg 


atcacctcca 




181 


tatgcagacc 


tcactgacgg 




241 


gaggtctaca 


acctcgcagc 




301 


accggagaca 


ccaccggcat 




361 


gtggactgcc 


ggctctatca 


35 


421 


cagaacgaat 


cgacgccgtt 




481 


cactggacga 


ctcgcaacta 




541 


ttcaaccatg 


agtccccccg 




€01 


gtggcgcgca 


tccgagctgg 




661 


cgcgactggg 


gctacgcgcc 


40 


721 


gaacctgacg 


actacgcccc 




781 


gctgcttttg 


accatgtcgg 




641 


ttgcgtccca 


ccgaggtcga 




901 


ggctggaaag 


cttcggttca 




961 


gccgcgttgg 


agtgcgatgg 


45 


1021 


agagtaagtt 


9« 



agggatcacg gggcaggatg gttcctacct cgccgagcta 
ggttcacggg ctcgttcgtc gagcttcgac gtttaacacg 
cgttgaccca caccaaccgg gcgcgcgcct gttcttgcac 
cacccggctg gtgaccctgc tcagcagtat cgacccggat 
gcagccccat gcgcgcgtca gctttgacga gccagtgcat 
gggatcgatc cgacttctgg aagcagtccg cctttctcgg 
ggcttcctcg tcggagatgt tcggcgcatc t<:cgccaccg 
ctatccccgt tcgccatacg gcgcggccaa ggtcttctcg 
tcgagaggcg tacggattat tcgcagtgaa cggcatcttg 
gcgcggcgag actttcgtga cccgaaagat cacgcgtgcc 
cgcccaaccg gaggtctata tgggcaacct cgacgcgatc 
cgaatatgtc gaggggacgt ggaggatgtt gcaagcgcct 
ggcgacaggg cgtggttaca ccgtacgcga gttcgctcaa 
gctcgactgg caaaagcgcg tcaagtttga cgeccgctat 
ttcgctagta ggagatgccg acaaggcggc ccagtcactc 
tactggtgaa ctcgcgcgca ccatggtgga cgcggacacc 
cacaccatgg atcgacacgc cgatgttgcc tggttggggc 
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Seq. ID 


No. 


10 












1 


V 


IC 


R 


A 


L 


I 


T G 


ITGQDGSyLAELLLSKCyEVHG 


31 


h 


V 


R 


R 


A 


s 


T F 


NTSRlDHLyVDPHQPGARLFLH 


€1 


y 


A 


0 


L 


T 


D 


G T 


RLVTLLSS IDPDEVyHLAACSH 


91 


V 


R 


V 


S 


F 


D 


BPVHTCDTTGHGSIRLLBAVRLSR 


121 


V 


D 


c 


R 


F y Q A 


SSSEMFGASPPPONBfiTPPyPR 


151 


s 


P 


y 


G 


A 


A 


K V 


FSyMTTRNYREAyGLFAVHGIL 


181 


F 


N 


H 


E 


S 


P 


R R 


GBTFVTRKITRAVARIRAGVOS 


211 


E 


V y 


M 


6 


» 


L D 


AIRDMCYAPBYVEGMWRMLQAP 


241 


E 


p 


D 


D 


y 


V 


Is A 


TGRGyTVREFAQAAPDHVOLDW 


271 


Q 


K 


R 


V 


K 


F 


D D 


RYLRPTEVDSLVGDADKAAQSL 


301 


G 


M 


K 


A 


s 


V 


H T 


GELARIMVDADIAALBCDGTPW 


331 


I 


D 


T 


P 


H 


L 


P O 


W G R V S 



Seq. ID No. 11 

1 gtga»gcgag cgcttacaac agggatcacg gggcaggatg gttcctacct egccgagcta 
61 ctactgagca agggatacga ggttcacggg ctcgttcgcc gagcttcgac gtttaacacg 
121 tcgcggatcg atcaccteta cgttgaccca caeeaaccgg gcgcgegect gttcttgcac 
181 tatgcagacc tcactgacgg cacccggttg gtgaccctgc tcagcagtat cgacccggat 
241 gaggtctaca acctcgcagc gcagtcccat gtgcgcgtca gctttgacga gccagtgcat 
301 accggagaca ceaccggcat gggatcgacc cgacttctgg aagcagtecg cctttctc^ 
361 gtggactgcc ggttctatea ggcttcctcg tcggagatgc tcggcgcatc tccgccacxg 
421 cagaacgaat cgacgccgtc ctatccccgt tcgccatacg gcgcggccaa ggtcctcccg 
481 taetggaega ctcgcaacta tcgagaggcg tacggatcat tegcagtgaa tggcatcctg 
541 ttcaaccatg agtccccccg gcgcggcgag actttcgtga cccgaaagat cacgcgtgcc 
601 gtggcgegca tecgagctgg cgtccaatcg gaggtctaca cgggcaacct cgatgcgatc 
661 cgcgactggg gctacgcgcc cgaatatgtc gaggggatgt ggaggatgtt gcaagcgcct 
721 gaacctgatg actacgtcct ggcgaeaggg cgtggttaca ccgtacgtga gttegctcaa 
781 gctgcctttg accacgtcgg gctcgactgg caaaagcacg tcaagtttga cgaccgctat 
841 ttgegcccca ccgaggtcga ctcgctagta ggagatgccg acagggcggc ccagtcaecc 
»01 ggctggaaag cttcggttca tactggtgaa ctcgcgcgca tcatggtgga cgcggacatc 
»ei gccgcgtcgg agtgcgatgg cacaccatgg atcgacacgc cgatgttgcc tggctggggc 
1021 ggagcaagcc ga 



Seq. ID No. 12 
























1 


V K 


R 


A 


L 


I 


T 


G 


I 


TGODGSyLABLLLSKGYEVH 


G 


31 


L V 


R 


R 


A 


5 


T 


P 


N 


T 


s 


R 


IDHLYVDPHOPOARLFL 


H 


$1 


y A 


D 


L 


T 


D 


G 


T 


R 


L 


V 


T 


LLSSIDPOEVyHLAAOS 


H 


91 


V R 


V 


S 


F 


D 


E 


P 


V 


H 


T 


G 


DTTGMGSIRLLEAVRLS 


R 


121 


V D 


c 


R 


F 


y 


Q 


A 


S 


S 


s 


B 


MFGASPPPQNESTPFyP 


R 


151 


S P 


y 


G 


A 


A 


K 


V 


F 


s 


y 


W 


TTRNyREAYGLFAVNGI 


L 


181 


F H 


H 


B 


S 


P 


R 


R 


G 


E 


T 


P 


VTRKITRAVARIRAGVQ 


S 


311 


B V 


y 


H 


G 


N 


L 


D 


A 


I 


R 


D 


NGyAPBYVEGMHRMLQA 


P 


241 


B P 


D 


D 


y 


V 


L 


A 


T 


G 


R 


G 


YTVREFAQAAFDHVGLD 


w 


271 


0 K 


H 


V 


K 


F 


D 


D 


R 


y 


h 


R 


PTBVDSLVGDADRAAQS 


L 


301 


G H 


K 


A 


S 


V 


H 


T 


G 


B 


h 


A 


RIMVDADIAASEC06TP 


W 


331 


I D 


T 


P 


M 


L 


P 


G 


W 


G 


G 


V 


S 





m 
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Seq. ID No. 13 

1 gtgcgatggc acaccatgga 
€1 acgactacac ctgggcctct 
121 ctggtcggct cagcgctcgt 
5 181 cgatcBcgcg atgagattga 

241 acaagaccac aggtgatcat 
301 acctatcccg cggacttctt 
361 gctgtcgccg tgcgcgtgcc 
421 tacgctccgc aacctatcca 

10 481 gacgcgtatg cgatcgccaa 

541 tatgggctgg cgtggatctc 
601 tccccgtccg ggtcgcatct 
661 ggtggtgcag aagaggtgac 
721 gtcgacgatc tggcgagcgc 

i5 781 gtcaacgtgg gcaccggcgt 

841 gcggtgggcc acatcggcga 
901 aaactattgg acgtctccgc 
961 gacggcatcg atgcaacggt 



-47- 



tcgacacgcc gatgttgcct ggttggggca gagtaagctg 
ggaccgcgca acgcccgcgc atatcgccgg tcatcggggg 
acgtagattt gaggccgagg ggctcaccaa tctcattgtg 
tctgacggac cgagccgcaa cgtttgattt tgtgtctgag 
cgatgcggcc gcacgggtcg gcggcatcat ggcgaataac 
gtccgaaaac ctccgaatcc agaccaaCtt gctcgacgca 
gcggctcctt ttcctcggtt cgtcatgcat ccacccgaag 
cgagagtgct ttattgactg gcccrccgga gcccaccaac 
gatcgccggt atcctgcaag ttcaggcggt taggcgccaa 
tgcgatgccg accaacctct acggacccgg cgacaacttc 
cttgccggcg ctcatccgtc gacatgagga agccaaagcc 
gaattggggg accggtactc cgcggcgcga actcccgcac 
atgcccgttc ctttcggaac atttcgatgg tccgaaccac 
cgatcacagc attagcgaga tcgcagacat ggtcgctaca 
aacacgttgg gatccaacta aacccgacgg aaccccgcgc 
gctacgcgag ttgggctggc gcccgcgaat cgcaccgaaa 
gtcgtggtac cgcacaaacg ccgatgccgt gaggaggcaa 



Seq. ID No. 14 



20 


1 


V R M 




31 


T P V 




61 


R S R 




91 


A R V 




121 


A V A 


25 


151 


L L T 




181 


Y G L 




211 


L I R 




241 


V D D 




271 


I S E 


30 


301 


K L L 




331 


R T K 



H T M D H H A D V A W L G Q 
yiAGHROLVG.SALV 
D E I D L T D R A A T F D F 
GGIMANNTYPADFL 
VRVPRLLFI-GSSCI 
GPLEPTHDAYAIAK 
AWISAMPTNLYGPG 
RYEEAKAGGASEVT 
LASACLFLLEHFDG 
lADMVATAVGYIGE 
DVSALREL6WRPRI 
A D A V R R 



s 


K 


L 




T 


T P 


G 


P 


L 


D 


R 


A, 


R 


R 


F 


E 


A 


E G 


F 


T 


N 


h 


I 


V 


V 


S 


E 


T 


R 


P 0 


V 


I 


I 


D 


A 


A 


S 


B 


N 


L 


R 


X Q 


T 


N 


L 


h D 


A 


Y 


P 


K 


Y 


A 


P 0 


P 


I 


H 


E 


S 


A 


X 


A 


G 


I 


L 


Q V 


0 


A 


V 


R 


R 


0 


D 


M 


P 


S 


P 


S G 


s 


H 


L 


L 


P 


A 


M 


H 


0 


T 


G 


T P 


R 


R 


B 


L 


L 


H 


P 


K 


H 


V N 


V G 


T 


G 


V 


D 


H 


S 


T 


R 


H 


0 


P 


T K 


P 


D 


G 


T 


P 


R 


A 


L 


K 


D 


6 


I D 


A 


T 


V 


S 


W 


y 
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Seq. ID Mo.lS 

1 gtgcgatggc acaccatgga tcgacacgcc gatgttgcct ggttggggcg gagtaagctg 
€1 acgactacac ctgggcctct ggaccgcgca acgcccgtgt atatcgccgg tcatcggggg 
121 ctggtcggct cagcgctcgt acgtagattt gaggccgagg ggttcaccaa tctcattgtg 
5 181 cgatCBcgcg atgagattga tctgacggac cgagccgcaa cgtttgattt tgtgtctgag 

241 acaagaccac aggtgatcat cgatgcggcc gcacgggtcg gcggcatcat ggcgaataac 
301 acctatcccg cggactccct gcccgaaaac ctccgaaccc agaccaattt gctcgacgca 
361 gctgtcgccg tgcgtgtgcc gcggctcctt ttcctcggtt cgtcatgcat ctacccgaag 
421 tacgctccgc aacctatcca cgagagtgct ttattgactg gccctttgga gcccaccaac 

^0 481 gacgcgtatg cgaccgccaa gatcgccggt atcctgcaag ttcaggcggt caggcgccaa 

541 tatgggctgg cgtggatctc tgcgatgccg actaaccccc acggacccgg cgacaacttc 
€01 tccccgtccg ggtcgcatcc cctgccggcg ctcatccgtc gacacgagga agccaaagct 
661 ggtggtgcag aagaggtgac gaattggggg accggtactc cgcggcgcga actcccgcat 
721 gtcgacgatc tggcgagcgc atgcctgctc cttttggaac attccgatgg tccgaaccac 

15 781 gtcaacgcgg gcaccggcgt cgatcacagc attagcgaga tcgcagacat ggtcgctacg 

841 gcggtgggct acatcggcga aacacgccgg gatccaacca aacccgatgg aaccccgcgc 
901 aaact-attsg acgtctccgc gctacgcgag ttgggttggc gcccgcgaat cgcactgaaa 
9€1 gacggcatcg atgcaacggt gtcgtggtac cgcacaaatg ccgatgccgt gaggaggcaa 



Seq. ID No.l€ 



30 



I 


V 


R 


H 


H 


T 


H 


D 


R 


H 


A 


0 


V A 


W 


L G R 


S 


K 


L 


T 


T 


T P G P L D 


R 


A 


31 


T 


P 


V 


Y 


I 


A 


G 


H 


R 


G 


L 


V 


G 


8 


A 


L V 


R 


R 


F 


S 


A 


B G F T N L 


I 


V 


61 


R 


S 


R 


D 


E 


1 


D 


L 


T 


D 


R 


A 


A 


T 


F 


D F V 


S 


E 


T 


R 


P Q V I I 0 


A 


A 


91 


A 


R 


V 


0 


G 


1 


K 


A 


N 


N 


T 


y 


P 


A 


D 


F L 


5 


B 


K 


h 


R 


I Q T H L L 


D 


A 


121 


A V A 


V 


R 


V 


P 


R 


L 


L 


P 


L 


G 


S 


S 


C I 


Y 


P 


K 


Y 


A 


P 0 P I H 6 


S 


A 


151 


L 


L 


T 


0 


P 


L 


E 


P 


T 


N 


D 


A 


Y 


A 


I 


A K 


I 


A 


G 


I 


L Q V 0 A V R 


R 


Q 


181 


y 


G 


L 


A 


H 


I 


5 


A 


M 


P 


T 


N 


L 


Y 


G 


P G 


D 


N 


F 


S 


P 


S G S H L L 


P 


A 


311 


L 


I 


R 


R 


Y 


£ 


E 


A 


K 


A 


G 


0 


A 


£ 


E 


V T 


N 


If 


G 


T 


G 


T P R R E L 


L 


H 


241 


V 


D 


D 




A 


S 


A 


C 


L 


F 


L 


L 


E 


H 


F 


D G 


P 


N 


H 


V 


N 


V G T G V D 


H 


5 


271 


I 


S 


E 


I 


A 


D 


N 


V 


A 


T 


A 


V 


G 


Y 


I 


G E 


T 


R 


W 


D 


P 


T K P D G T 


P 


R 


301 


K 


L 


h 


D 


V 


S 


A 


L 


R 


E 


L 


Q 


H 


R 


P 


R I 


A 


L 


K 


D 


G 


I D A T V S 


W 


Y 


331 


R 


T 


N 


A 


D 


A 


V 


R 


R 

































Seq. ID No. 17 





1 


atggattttt 


tgcgcaacgc 


cggcttgatg 


gctcgtaacg 


ttagtaccga 


gatgctgcgc 


35 


61 


cacttcgaac 


gaaagcgcct 


attagcaaac 


caattcaaag 


catacggagt 


caacg^tgtt 




121 


attgatgtcg 


gtgctaactc 


cggccagttc ggtagcgcct 


tgcgccgcgc 


aggac^caag 




181 


agccgtatcg 


tcccctttga 


acctctttcg gggccatttg 


cgcaactaac 


gcgcaagtcg 




241 


gcatcggatc 


cactatggga 


gtgtcaccag 


tatgccctag 


gcgacgccga 


cgagacgatt 




301 


accatcaatg 


tggcaggcaa 


tgcgggggca 


agcagccccg 


tgctgccgat 


gcttaaaagt 


40 


361 


catcaagatg 


cctttcctcc 


cgcgaattat 


atcggcaccg 


aagacgtcgc 


aatacaccgc 




431 


cttgattcgg 


Ctgcatcaga 


atttctgaac 


cctaccgacg 


ccactttcct 


gaagatcgac 




461 


gtacagggtt 


tcgagaagca 


ggttatcacg ggcagtaagt 


caacgcttaa 


cgaaagccgc 




541 


gtcggcacgc 


oactcgaact 


tcctcttatt 


ccgttgcacg 


aaggtgacat 


gctgatccat 




601 


gaagcgcttg 


aacttgtcta 


ttccccaggt 


ttcagaccga 


cgggtttgtt 


gcccggctcc 


45 


661 


acggacccgc 


gcaatgg^cg 


aacgcttcaa 


gccgacggca 


ttttcttccg 


tggggacgat 




731 


tga 
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Seq. ID Ho. 18 



1 


M D F L R 


N 


A 


G L 


M 


A R N 


VSTBMLRHFERK 


R L L 


V H 


31 


0 F K A Y 


G 


V 


K V 


V 


I D V 


GANSGQPGSALR 


RAG 


F % 


61 


S R I V S 


F 


E 


P L 


S 


GPFAQLTRKSASDPLWEC 


H Q 


91 


Y A L G D 


A 


D 


E T 


I 


T 1 W 


VAGNAGASSSVL 


P N L 


K S 


121 


H 0 D A F 


P 


P 


A N 


Y 


I G T 


EDVAIHRLDSVA 


SEP 


L M 


151 


P T D V T 


P 


L 


K 2 


D V Q G 


FEKQVITG5 RST 


L V 8 


S C 


181 


V G M Q L 


E 


L 


S F 


X 


PLY 


EGDHLIHEALEL 


V Y S 


V G 


211 


F R Xi T G 


L 


L 


P G 


F 


T D P 


RNGRMLQADGXF 


F R G 


D D 



10 Seq. ID No. 19 



1 


atggattttt 


tgcgcaacgc 


61 


cacttcgaac 


gaaagcgcct 


121 


attgatgtcg 


gtgctaactc 


181 


agccgtarcg 


tctcctttga 


241 


gcatcggatc 


cactacggga 


301 


accatcaatg 


tggcaggcaa 


361 


catcaagatg 


cctttcctcc 


421 


cttgattcgg 


ttgcatcaga 


481 


gcacagggtt 


tcgagaagca 


541 


gtcggcatgc 


aactcgaact 


601 


gaagcgcttg 


aacctgtcta 


661 


acggatccgc 


gcaatggtcg 


721 


tga 





cggcttgacg gctcgtaacg ttagcaccga gatgctgcgc 
attagcaaac caactcaaag catacggagt caacgttgct 
cggccagctc ggtagcgctt tgcgtcgtgc aggactcaag 
acctctttcg gggccatttg cgcaactaac gcgcgagtcg 
gcgtcaccag tacgccctag gcgacgccga tgagacgatt 
tgcgggggca agtagttccg cgctgccgat gcctaaaagt 
cgcgaattat actggcaccg aagacgtcgc aacacaccgc 
attcctgaac cctacrgatg ttactttcct gaagatcgac 
ggtcatcgcg ggcagtaagt caacgctcaa cgaaagccgc 
ttcttttatc ccgttgtacg aaggrtgacat gctgattcat 
ctccccaggt ttcagactga cgggtttgtt gcccggattt 
aatgcttcaa gctgacggca ttctcttccg tggggacgat 



Seq. ID No. 20 



25 



30 



1 


M 


D 


F 


L 


R 


N 


A 


G 


L M A R N 


31 


Q 


F 


K 


A 


y 


G 


V 


N 


V V I D V 


61 


S 


R 


I 


V 


5 


F 


E 


P 


L S G P F 


91 


y 


A 


L 


G 


D 


A 


D 


E 


T I T I N 


121 


H 


0 


O 


A 


F 


P 


P 


A 


N Y I O T 


151 


p 


T 


D 


V 


T 


F 


L 


K 


Z D V 0 G 


181 


V 


G 


N 


Q 


L 


E 


L 


S 


P I P L Y 


211 


F 


R 


L 


T 


G 


L 


L 


F 


G F T D P 



VSTEHLRHFERKRLLVN 
GANSGQFGSALRRAGFK 
AOLTRESASDPLWECHQ 
VAGNAGASSSVLPNLKS 
EDVAXHRLDSVASEFLN 
FEKQVIAGSK6TLNESC 
EGDHLIHEALELVY6LG 
RNGRMLQADGXFFRODD 



wo 97/23624 



KrryGB96/03221 



Seq. ID No. 21 

I atgactgcgc cagcgctctc 
61 gcctgcctcg gaagcatcgt 
121 ggcggttcga ccgatcggac 
5 161 cgactggtcg ttcacagcgg 

241 ggcgtggcca caggcgaatg 
301 accacgttgg cccaggtagc 
361 ggcgatgttg tgatgcgttc 
421 ctcccatttg agacgaattt 
10 461 ggcaccggcc cttacaacct 

541 tgcttctcca acccggcgct 
€01 gacatgaccg gcttcagcat 
€61 atgtacttct gggttgcagg 
721 aaggagaatc gccgtctggc 
IS 761 gaacgaagcg cagaaccgta 



-•50- 



gataattatc cctacctcca acgcagcggt gacgctgcaa 
cgggcagacc taccgggaag cggaagtggc ccctgtcgac 
cctcgacacc gcgaacagtt tccgcccgga acccggctcg 
gcccgatgat ggcccccacg acgccatgaa ccgcggcgcc 
ggtacttttc ttaggcgccg acgacaccct ctacgaacca 
cgcctctccc ggcgaccacg cggcaagcca tctcgrccat 
gacgaaaagc cggcatgccg gacctttcga cctcgaccgc 
gtgccaccaa tcgatcttct acegccgtga gcctccxrgac 
gcgctaccga gtccgggcgg actgggactt caatattcgc 
gatcacccgc tacatggacg tcgtgatttc cgaacacaac 
gaggcagggg actgataaag agttcagaaa acggctgcca 
gtgggagact tgcaggcgca tgccggcgtt tttgaaagac 
cttgcgtacg cggttgataa gggttaaggc cgtctccaaa 
9 



Seq. ID No. 22 



1 


M 


T 


A 


P 


V 


P 


S 


I 


I 


I 


P T 


F 


31 


y 


R 


E 


V 


E 


V V 


L 


V 


D 


6 


G 


S 


61 


R 


L 


V 


V 


H 


S 


0 


P 


D 


D 


G 


P 


y 


91 


h G 


A 


D 


D 


T. 


L 


y 


B 


P 


T 


T 


L 


121 


o 


D 


V 


V 


H 


R 


5 


T 


K 


5 


R 


H 


A 


151 


s 


I 


F 


y 


R 


R 


E 


L 


F 


D 


G 


I 


G 


181 


c 


F 


S 


N 


P 


A 


L 


I 


T 


R 


y 


M 


D 


211 


T 


D 


X 


E 


F 


R 


R 


R 


L 


P 


H 


y 


F 


241 


K 


E 


N 


R 


R 


h 


A 


L 


R 


T 


R 


L 


I 



N 


A A V T Ii 0 A 


C 


L 


G 


S 


I 


V 


Q 


Q 


T 


T 


D R T L D 


I A 


N 


S 


F 


R 


P 


£ 


L 


G 


S 


D 


A N N R G 


V G 


V 


A 


T 


G 


E 


H 


V 


L 


F 


A 


Q V A A F 


L G 


P 


H 


A 


A 


S 


H 


L 


V 


Y 


G 


P F D L D 


R h 


L 


F 


E 


T 


N 


L 


C 


H 


Q 


P 


Y N L R Y 


R V 


H 


A 


D 


H 


D 


F 


N 


I 


R 


V 


V I S B y 


N D 


N 


T 


G 


F 


S 


M 


R 


Q 


G 


w 


V A G N E 


T C 


R 


R 


N 


L 


A 


F 


L 


K 


D 


R 


V K A V S 


K E 


R 


5 


A 


E 


P 











Seq. ID No. 23 

1 atgactgcgc cagtgttctc 
€1 gcctgcctcg gaagcatcgt 
121 ggcggttcga ccgatcggac 

30 161 cgactggtcg ttcacagcgg 

241 ggcgtagcca caggcgaatg 
301 accacgttgg cccaggtagc 
361 ggcgatgttg tgatgcgttc 
421 ctcctatttg agacgaattt 

35 461 ggcatcggcc cttacaacct 

541 tgcttctcca acccggcgct 
€01 gacatgaccg gcttcagcat 
661 atgtacttct gggttgcagg 
721 aaggagaatc gccgtctggc 

40 761 gaacgaagcg cagaaccgta 



gataattatc cctaccttca atgcagcggt gacgctgcaa 
cgggcagacc taccgggaag tggaagtggt ccttgtcgac 
cctcgacatc gcgaacagtt tccgcccgga actcggctcg 
gcccgatgat ggcccctacg acgccatgaa ccgcggcgtc 
ggtacttttt ttaggcgccg acgacaccct ctacgaacca 
cgcttttctc ggcgaccatg cggcaagcca tcttgtctat 
gacgaaaagc cggcatgccg gacctttcga cctcgaccgc 
gtgccaccaa tcgatctttt accgccgcga gcttttcgac 
gcgctaccga gtctgggcgg actgggactt caatattcgc 
gattacccgc tacatggacg tcgtgatttc cgaatacaac 
gaggcagggg actgataaag agttcagaaa acggctgcca 
gtgggagact tgcaggcgca tgctggcgtt tttgaaagac 
cttgcgtacg cggttgataa gggttaaggc cgtctccaaa 
9 



PCT/GB96/03221 



Seq. ID No. 24 



1 


N 


T 


A P 


V 


F 


S 


I 


I 


I 


F 


T 


p 


31 


y 


R 


E V 


E 


V 


V 


L 


V 


V 


G 


G 


s 


61 


R 


Is 


V V 


H 


s 


G 


P 


D 


D 


6 


P 


y 


91 


L 


G 


A D 


D T 


L 


y 


E 


P 


T T 


L 


121 


G 


D 


V V 


M 


R 


S 


T 


K 


S 


R 


H 


A 


ISI 


S 


I 


F Y 


R 


R 


E 


L 


F 


D 


G 


I 


G 


181 


C 


P 


S N 


P 


A 


L 


I 


T 


R 


y 


H 


D 


211 


T 


D 


K £ 


P 


R 


X 


R 


L 


P 


H 


y 


F 


241 


K 


B 


N R 


R 


L 


A 


L 


R 


T 


R 


h 


I 



-51- 



N A 


A 


V 


T 


L Q A C 


L 


G S 


I 


V 


G 


Q T 


T D 


R 


T 


L 


D I A N 


S 


F R 


P 


£ 


L 


6 S 


D A 


H 


N 


R 


QVGVATGE 


H 


V 


L F 


A Q 


V 


A 


A 


F L G D 


H 


A A 


S 


H 


L 


V y 


G P 


F 


D 


L 


D R L L 


P 


E T 


N 


L 


C 


H Q 


p y 


M 


L 


R 


y R V w 


A 


D M 


D 


F 


N 


I R 


V V 


I 


S 


B 


y N D H 


T 


G F 


5 


H 


R 


Q G 


W V 


A 


G 


H 


E T C R 


R 


H h 


A 


F 


L 


X D 


R V 


K 


A 


V 


S X E R 


S 


A B 


P 









Seq. ID NO. 25 





1 


gtggccagca 


gaagtcccca 




61 


ctcgcggccg 


gcgtggcgca 




121 


gtgcagcagc 


cgatcgagga 


15 


lei 


ttgttcgaag 


ggccaatgcg 




241 


gagccggaac 


aacagttgag 




301 


gatgaccaga 


tccgcgcgga 




361 


gccgcggcag 


aggacctcga 




421 


gcggacggca 


gcgcgcccgc 


20 


461 


aatgaccgcc 


aggctccgtt 




541 


ggctgcggtg 


atcgacgacc 




601 


ggctgcggcc 


ttgagacggt 




661 


aaccaacgtc 


cgcaggatct 




121 


ctcggcggcg 


ttgcggcgca 


25 


7B1 


agcagccagc 


ggtgccgccg 




B41 


gtcccgtcgg 


tggtgttgat 



ctccgctgcg ggtggttggc taattcttgg cggctccctt 
tccggtagga ctcgccggag gtgacgacga cgctggcgtg 
tgctggcggc ggtggtgtgc tcgggcagga atcgccccca 
aggcgatggc cagggagcgg cgctcgtagc cggcagccac 
tcccggtgcc gtcgagcggg gcgaagccga tctcgtccaa 
gcagggtgtc gatgatcttg ccgacggtgt tgtcggccag 
tcaggtcggc ggcggtgaag tagcggaccc cgaatccggc 
agccgatgag caggtgactt ttgcccgtac caggtgggcc 
gtgcccgaat ccattccagg ctcgacaggt agtcgaacgt 
cggtgacgtc gaacccgccg agggccctgg tgaccgggaa 
tggcggtgtt ggaggcatcg cgggcagcga tctcggcccc 
cctccggtgt ccagcgttgc gtcttggcga cttgcaacac 
ccgtggccag cttcaaccgc cgcagcgccg cgtcaaggtc 
aggacggtgc caccggcttg gcagcggtgg tcatgaggcc 
cttgtag 



Seq. ID NO. 26 



1 


V 


A 


S 


R 


S 


P 


H 


S 


A A G G H 


31 


L 


A 


O 


G 


D 


D 


D A G V V Q Q 


61 


L 


F 


E 


G 


P 


M 


R 


0 


D 0 Q G A 


91 


V 


E 


R 


6 


E 


A 


D 


I. 


V Q D D Q 


121 


A 


A 


V 


E 


D 


h 


D 


Q 


V G G G B 


151 


0 V T 


F 


A 


R 


T 


R 


W A N D R 


181 


6 


^ 


G 


D 


R 


R 


S 


G 


D V B P V 


211 


G 


G 


I 


A 


G 


S 


D 


L 


G L N 0 R 


241 


L 


G 


G 


V 


A 


A 


H 


R 


G 0 L 0 P 


271 


H 


R 


L 


G 


5 


G 


G 


H 


E A V P 5 



L 


I L 


G 


G 


S 


h 


LVVGVAHP 


V 


G 


P 


I E 


D 


A 


G 


G 


GGVLGQES 


P 


P 


A 


L V 


A 


G 


S 


H 


EPEOQLSP 


G 


V 


I 


R A 


E 


Q 


0 


V 


DDLADGVV 


0 0 


V 


A D 


F 


E 


S 


G 


VDGSVPAA 


D 


E 


Q 


V L 


L 


C 


P 


N 


PFQARQVV 


E 


R 


B 


G L 


G 


D 


R 


B 


GCGLiETVO 


G 


V 


P 


Q D 


L 


h 


R 


C 


PALRLGDL 


Q 


H 


P 


0 R 


R 


VKV5SQRCRRG 


R 


C 


V 


V L 


I 


L 
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52- 



Seq. ID No. 27 





1 


atgggctgcc 


tcaaaggtgg 


tgtcgtcgcc 


aatgttgtcg 


ttccaacacc ggattatgtg 




£1 


cgactcgcgc 


cccaccatgg 


cttcgttccg 


gacttctgcc 


acggtgcgga 


tccgcaatcg 




121 


aagggcatcg tggagaacct 


ctgcggccac 


gctcaggacg 


accttgcggt 


gccgccgctg 


5 


lei 


accgaagctg 


cgttagccgg 


tgagcaggtc gacctacgcg 


ccctcaacgc 


ccaggcgcaa 




241 


ctacggcgcg 


ccgaggtcaa 


tgccacggtc 


cactcggaga 


tctgcgccgt gcccaacgat 




301 


cgcttggttg 


acgagcgcac 


cgtcttgagg 


gagctgcccc 


cgctgcggcc gacgatcggc 




361 


tcggggccgg 


tgcgccgtaa 


ggtcgacggc 


ctcccgcgca 


tccgttacgg 


ctcagcccgc 




421 


tactcggtgc 


ctcagcggct 


cgtcggcgcc 


accgtggcgg 


tggcggccga 


tcatggcgcc 


10 


461 


ctgatcctgt 


tggaacctgc 


gaccggcgtg 


atcgtggccg 


agcacgagct 


cgt.cagccca 




541 


ggtgaggtgc 


ccatcctcga 


tgaacactac 


gacggaccca 


gacccgcacc 


ct^gcgtggc 




601 


ccccgcccga 


aaacccaagc 


agagaaacga 


ttctgcgcat 


tgggaaccga 


agcgcagcag 




6(1 


ttcctcgtcg gtgctgctgc 


gaccggcaac 


acccgactga 


aatccgaacc 


cgacattctg 




721 


ctcggccttg gcgccgccca 


cggcgaacag gctttgatcg 


acgcgccgcg 


ccgggcggct 


15 


781 


gcgtttcgcc 


ggttccgcgc 


tgccgacgtg 


cgctcgatcc 


tggccgccgg 


cgccggcacc 




841 


ccacaacccc 


gccccgccgg 


cgacgcactc 


gtgctcgatc 


tgcccaccgt 


cgagacccgc 




901 


tcgttggagg 


cccacaagat 


caacaccacc 


gacgggacgg 


ccccatgacc 


accgctgcca 




961 


agccggtggc accgtcctcg 


gcggcaccgc tggccgccga 


cctcgacgcg 


gcgctgcggc 




1021 


ggttgaagct ggccacggtg 


cgccgcaacg 


ccgccgaggt 


gttgcaagtc gccaagacgc 


20 


1081 


aacgctggac 


accggaggag 


accctgcgga 


cgttggttga 


ggccgagatc 


gctgcccgcg 




1141 


atgcctccaa 


caccgccaac 


cgtctcaagg 


ccgcagcctt 


cccggtcacc 


aagaccctcg 




1201 


acgggttcga cgtcaccgga 


ccgtcgatca 


ccgcagccac 


gttcgaccac 


ctgccgagcc 




1261 


tggaatggat 


ccgggcacaa 


cagaacctgg 


cggccaccgg 


cccacctggc 


acgggcaaaa 




1321 


gtcacctgct 


catcggctgc 


gggcacgctg ccgtccacgc 


cggattcaaa 


gcccgccact 


25 


13B1 


tcaccgccgc 


cgacctgatc 


gaggtcctct 


accgcggcct 


ggccgacaac 


accgtcggca 




1441 


agatcatcga 


caccccgctc 


cgcgcggatc 


tggtcatctc 


ggacgagacc 


ggcttcgccc 




ISOI 


cgctcgacga 


caccgggact 


caactgttgc 


tccggcccgt 


ggctgccggc 


tacgagcgcc 




1561 


gctccctggc 


catcgcctcg 


cattggccct 


tcgaacaatg 


ggggcgatCc 


ctgcccgagc 




1621 


acaccaccgc 


cgccagcatc 


ctcgatcggc 


tgctgcacca 


cgccagcatc 


gtcgccacct 


30 


1681 


ccggcgagtc 


ctaccggacg 


cgccacgccg 


accacaagaa 


gggagccgcc 


aagaactag 



Seq. ID No. 28 



1 


M G 


C 


L 


K G G 


V 


V A 


N 


V V V 


31 


D F 


c 


H 


GAD 


p 


0 s 


K 


G I V 


61 


T B 


A 


A 


LAG 


E 


0 V 


D 


L R A 


91 


H S 


E 


I 


C A V P 


N D 


R 


LVD 


121 


S G 


S 


V 


E R K 


V 


D G 


L 


SCI 


151 


T V 


A 


V 


V V D 


H 


G A 


L 


ILL 


181 


G E 


V 


s 


I L D 


£ 


H y 


D 


G P R 


211 


P C 


A 


h 


GTE 


A 


Q 0 


F 


L V G 


241 


li G 


L 


G 


A A H 


G 


B Q 


A 


LID 


271 


R S 


I 


L 


A A G 


A 


G T 


P 


0 P R 


301 


S L 


E 


A 


y K I 


N 


T T 


D 


G T A 



PTPDYVRPASRYGFVP 
ENLCGyAQDDLAVPLL 
LNAQAQLWCABVNATV 
ERTVLRELPSLRPTIG 
RYGSARYSVPQRLVGA 
EPATGVIVABHELVSP 
PAPSRGPRPKTQABKR 
AAAIGNTRLKSELDIL 
ALRRAVAFRRFRAADV 
PAGDALVLDLPTVETR 
S 



wo 97/23624 
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Seg. ID lio.29 



1 


M 


T T 


A 


A 


K 


P V A P S S A 


31 


T 


V R 


R 


N 


A 


A E V L Q V A 


€1 


E 


Z A 


A 


A 


D 


A S K T A N R 


91 


T 


G S 


5 


I 


T 


A A T F D y L 


121 


P 


G T 


G 


K 


S 


H L L Z G C 0 


151 


L 


Z E 


V 


L 


y 


R G L A D N T 


IBl 


E 


Z G 


F 


A 


p 


L D D T G T Q 


211 


A 


5 H 


H 


P 


F 


E Q W G R F L 


241 


S 


Z V 


V 


T 


S 


G E S y R N R 



-S3 



A P L 


A 


A 


D 


L D A A L R 


R 




R 


L 


A 


K T 0 


R 


W T 


P E £ Z L R 


T 


h 


V 


B 


A 


L X A 


A 


A 


F 


P V T X T L 


D 


G 


F 


D 


V 


SSL 


B 


« 


Z 


R A C Q N L A 


V 


Z 


G 


P 


H A A 


V 


H 


A 


G F K V R y 


F 


T 


A 


A 


D 


V G X 


Z 


Z 


D 


T L L R A D 


L 


V 


Z 


L 


D 


L L F 


R 


L 


V 


A A G y E R 


R 


5 


L 


A 


Z 


PER 


T 


T 


A 


A S Z L D R 


L 


L 


H 


H 


A 


HAD 


H 


K 


K 


G A A K N 













Seq. ID No. 30 





1 


gtgacgtctg 


ctccgaccgc 




€1 


cagcgcacgg 


cgaaaagtgt 




121 


atcgacggcg 


gcagcggcga 


15 


161 


gcgtattggc 


agcccgagcc 




241 


cocgcatcgg 


gtgatctgct 




3 01 


gtggtagccc 


aggccgtgga 




361 


ttcgggatgg 


accgcctcgt 




421 


cgcaaattcc 


tggccggcaa 


20 


4B1 


ctggcggcca 


agatcggtgg 




541 


atattgcggg 


ccgcgctggt 




€01 


ga caeca egg 


gcgtcggctc 




€61 


atgggcgacc 


ttcatcgccg 




721 


cgcggccggg 


agttctacgc 


25 


781 


tcgaaacag 





ctcggtgata acgatctcgt ccaacgacct cgacgggttg 

gcgggcgcaa cgctaccggg gacgcatcga gcacatcgta 

cgacgtggtg gcatacctgt ccgggtgcga accaggcttc 

cgacggcggg cggtacgacg cgatgaacca gggcatcgcg 

gtggttcttg cactccgccg atcgtttttc cgggcccgac 

ggcgctatcc ggcaagggac cggtgtccga attgtggggc 

cgggctcgat cgggtgcgcg gcccgatacc tttcagcctg 

gcaggttgct ccgcatcaag catcgctctt cggatcatcg 

ctacgacctt gatttcggga tcgccgccga ccaggaattc 

atgcgagccg gtcacgattc ggtgtgtgct gtgcgagtcc 

gcaccgggaa ccaagcgcgg tcttcggtga tctgcgccgc 

ctacccgttc gggggaaggc gaatatcaca tgcctaccta 

ctacaacagt cgattctggg aaaacgtcct cacgcgaacg 



Seq. ID No. 31 



1 


M 


T 


S 


A 


P 


T 


V 


S 


V 


I 


T 


Z 


5 


31 


R 


y 


R 


G 


R 


I 


£ 


H 


z 


V 


Z 


D 


G 


€1 


A 


Y 


W 


Q 


S 


E 


P 


D 


G 


G 


R 


y 


D 


91 


H 


s 


A 


D 


R 


F 


S 


G 


P 


D 


V 


V 


A 


121 


F 


G 


N 


D 


R 


L 


V 


G 


L 


D 


R 


V 


R 


151 


P 


H 


Q 


A 


S 


F 


F 


G 


S 


S 


L V A 


161 


Z 


L 


R 


A 


A 


L 


V 


C 


E 


P 


V 


T 


Z 


211 


P 


S 


A 


V 


F 


G 


D 


L 


R 


R 


N 


G 


D 


241 
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Seq. ZD Ko.32 



I gcgaagcgag cgcccatcac cggaatcacc 
61 ctgctggcca aggggtatga ggctcacggg 
121 tcgcggaccg atcacctcta cgtcgacccg 
lei tatggtgacc tgatcgacgg aacccggttg 
241 gaggtgtaca acctggcggc gcagtcacac 
301 accggtgaca ccaccggcat gggatccacg 
361 gtgcactgcc gcttctatca ggcgtcctcg 
421 cagaacgogc tgacgccgtt ctacccgcgg 
461 cactgggcga cccgcaatta tcgcgaagcg 
541 ttcaatcacg aatcaccgcg gcgcggcgag 
€01 gtggcacgca tcaaggccgg tatccagccc 
€61 cgcgactggg ggtacgcgcc cgaatacgtc 
721 gagcccgacg acttcgtttt ggcgaccggg 
781 gccgcgttcg agcatgccgg ttcggaccgg 
B41 ctgcggccca ccgaggtgga ttcgctgatc 
901 ggccggaggg cttcggtgca cactgacgag 
961 gcggcgctgg agcgcgaagg caagccgcgg 
1021 tga 



ggccaggacg gctcgtatct cgccgaactg 
ctcatccggc gcgcttcgac gttcaacacc 
caccaaccgg gcgcgcggct gtttctgcac 
gtgaccctgc tgagcaccat cgaacccgac 
gtgcgggtga gcttcgaega acccgcgcac 
cgaccgctgg aagccgttcg gctctctcgg 
tcggagatgc tcggcgcccc gccgccaccg 
tcaccgtatg gcgccgccaa ggtctattcg 
tacggattgt tcgccgctaa cggcatcttg 
acgttcgtga cccgaaagat caccagggcc 
gaggcctaca cgggcaatct ggBi:gcggtc 
gaaggcatgt ggcggatgcc gcagaccgac 
cgcggtttca ccgtgcgtga gttcgcgcgg 
cagcagtacg tgaaattcga ccaacgctac 
ggcgacgcga ccaaggctgc cgaattgctg 
ttggctcgga ccatggtcga cgcggacacg 
atcgacaagc cgacgaccgc cggccggaca 



Seq. ID Mo. 33 
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Seq. ZD No. 34 

1 atgaggctgg cccgtcgcgc tcggaacatc ttgcgtcgca acggcatcga ggtgtcgcgc 
61 tactttgccg aactggactg ggaacgcaat ttcttgcgcc aactgcaatc gcatcgggcc 
121 agtgccgtgc tcgatgtcgg ggccaattcg gggcagtacg ccaggggcct gcgcggcgcg 
18 1 ggcttcgcgg gccgcatcgt ctcgttcgag ccgctgcccg ggccctttgc cgtcttgcag 
241 cgcagcgcct ccacggaccc gtcgtgggaa tgccggcgct gcgcgctggg cgatgtcgat 
301 ggaaccatct cgatcaacgt cgccggcaac gagggcgcca gcagttccgt cttgccgacg 
361 ttgaaacgac atcaggacgc ctttccacca gccaaccacg tgggcgccca acgggtgccg 
421 atacatcgac tcgatcccgt ggctgcagac gttctgcggc ccaacgatat tgcgttcttg 
4B1 aagarcgacg ctcaaggatt cgagaagcag gtgaccgcgg gtggcgattc aacggtgcac 
541 gaccgatgcg tcggcatgca gctcgagctg tctttccagc cgttgtacga gggtggcatg 
601 ctcatccgcg aggcgcccga cctcgtggac ccgttgggct tcacgctctc gggat^gcaa 
661 cccggtttca ccgacccccg caacggtcga atgctgcagg ccgarggcat -cttctcccgg 
721 ggcagcgact ga 
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Seq. ID No. 35 
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Seq. ID No. 36 

1 9tgaaatcgt tgaaacccgc tcgtttcatc gcgcgtagcg ccgccttcga ggtcccgcgc 
€1 cgctattccg agcgagacct gaagcaccag cttgtgaagc aactcaaatc gcgtcgggta 
121 gatgtcgttt tcgatgtcgg cgccaactca ggacaatacg ccgccggcct ccgccgagca 
15 181 gcatataagg gccgcattgt ctcgttcgaa ccgctatccg gaccgtttac gatctcggaa 

241 agcaaagcgt caacggatcc actttgggat tgccggcagc atgcgttggg cgattctgat 
301 ggaacggtta cgatcaatat cgcaggaaac gccggtcaga gcagctccgt cttgcccatg 
361 ctgaaaagtc atcagaacgc tcttcccccg gcaaaccatg tcggtaccca agaggcgccc 
_ _ . 421 atacatcgac ttgattccgt ggcgccagaa cttctaggca tgaacggcgt cgcttttctc 

20 481 aaggccgacg ttcaaggctt tgaaaagcag gtgctcgccg ggggcaaatc aaccatagac 

541 gaccattgcg ccggcatgca actcgaactg tccctcccgc cgttgtacga aggtggcatg 
601 ctcatitcctg aagccctcga cctcgtgtat tccttgggct tcacgttgac gggattgc^g 
661 ccttgtttca ttgatgcaaa taacggtcga atgttgcagg ccgacggcat ctccttccgc 
721 gaggacgatt ga 



2S Seq. ID No. 37 
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SeQ. IV No. 38 





1 


atggtgcaga 


cgaaacgaca 




61 


gccgcaccaa 


tgttttcgat 




121 


tgccrcgaca 


gcatcgcccg 


5 


161 


ggctcgacgg 


acgaaaccct 




241 


ctgatcattc 


accgcgacac 
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ctggccaccg 


gaacgtggct 




361 


accctggcgc 


gggtggccgc 




421 


gacgtgatca 


tgcgcccaac 


10 


481 


ctgttcaagc 


gcaacatctg 




541 


atcggtccct 


acaacctccg 
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ttttccaacc 


cagcgctcgt 
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ttcggcgggc 


tcagcaatac 




721 


acgagactcg 


gcataaggct 


15 


781 


agggccatgg 


taatgcgcac 



cgccggcttg accgcagcca acacaaagaa agtcgccatg 

catcatcccc accttgaacg cggctgcggc attgcctgcc 

tcagacctgc ggtgacttcg agctggtact ggtcgacggc 

cgacaccgcc aacattttcg cccccaacct cggcgagcgg 

cgaccagggc gtctacgacg ccacgaaccg cggcgtggac 

gctctctccg ggcgcggacg acagcctgca cgaggctgac 

cttcattggc gaacacgagc ccagcgatct ggtatatggc 

caatttccgc tggggtggcg ccttcgacct cgaccgtctg 

ccatcaggcg accttctacc gccgcggact cttcggcacc 

ctaccgggcc ccggccgacc gggacttcaa tattcgccgc 

cacccgctac atgcacgtgg tcgttgcaag ctacaacgaa 

gatcgccgac aaggagtttt tgaagcggct gccgatgtcc 

ggtcatagtt ctggcgcgca ggtggccaaa ggtgatcagc 

cgtcacttct tggcggcgcc gacgttag 



Seq. ID No. 39 
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Seq 40: 

GATGCCGTX^AGGAGGTAAAGCTGC 
Seq 41: 
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CIAIMS 

1. A polypeptide in substantially isolated foxnn which 
comprises any one of the sequences selected from the group 
consisting of Seq.ID.No: 6, 8, 10, 12, 14, 16, 18, 20, 22, 
24, 26, 28, 29, 31, 33, 35, 37 and 39, or a polypeptide 
substantially homologous thereto. 

2. A polypeptide in substantially isolated form which 
comprises any one of the sequences selected from the group 
consisting of Seq.ID.No: 6, 8, 10, 12, 14, 16, 18, 20, 22, 
24, 26, 28, 29, 31, 33, 35, 37 and 39. 

3 . A polypeptide which comprises a fragment of a 
polypeptide defined in claim 1 or 2, said fragment 
comprising at least 12 amino acids and an epitope. 

4. A polynucleotide in substantially isolated form which 
encodes a polypeptide according to any one of claims 1 to 
3. 

5 . A polynucleotide in substantially isolated form which 
is capable of selectively hybridizing to SEQ ID NO: 3 or 4 
or a fragment thereof. 

6. A polynucleotide fragment according to claim 5 which 
comprises a sequence selected from the group consisting of 
Seq.ID.No: 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25 and 27, 
or a polynucleotide at least 90% homologous thereto. 

7. A polynucleotide in substantially isolated form 
comprising a sequence selected from the group consisting of 
Seq.ID.No: 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25 and 27. 

8. A polynucleotide in substantially isolated form 
consiting essentially of a sequence selected from the group 
Seq ID Nos. 30, 32, 34, 36 and 38, or a polynucleotide at 
least 90% homologous thereto. 
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9. A polynucleotide in substantially isolated form 
consiting essentially of a sequence selected from the group 
Seq ID Kos. 30, 32, 34, 36 and 38. 

10. A polynucleotide probe which comprises a fragment of 
at least 15 nucleotides of a polynucleotide as defined in 
any one of claims 4 to 8, optionally carrying a revealing 
label - 

11. A recombinant vector carrying a polynucleotide as 
defined in any one of claims 4 to 8. 

12. An antibody capable of binding a polypeptide or 
fragment thereof as defined in any one of claims l to 3 . 

13. A test kit for detecting the presence or absence of a 
pathogenic mycobacterium in a sample which comprises a 
polynucleotide according to any one of claims 4 to 10, a 
polypeptide according to any one of claims 1 to 3, or ah 
antibody according to claim 12 . 

14. A method of detecting the presence or absence of 
antibodies in an animal or human, against a pathogenic 
mycobacteria in a sample which comprises: 

<a) providing a polypeptide according to any one of 
claims 1 to 3 comprising an epitope; 

(b) incubating a biological flan5)le with said 
polypeptide under conditions which allow for the 
formation of an antibody- antigen complex; and 

(c) determining whether antibody-antigen complex 
comprising said polypeptide is formed. 

15. A method of detecting the presence or absence of a 
polypeptide according to any one of claims 1 to 3 in a 
biological sample which method which comprises: 

(a) providing an antibody according to claim 11; 

(b) incubating a biological sample with said antibody 
under conditions which allow for the formation of 
an antibody-antigen con^lex; and 
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(c) determining whether antibody- antigen complex 
comprising said antibody is formed. 

16 . A method of detecting the presence or absence of cell 
mediated immune reactivity in an animal or human, to a 
polypeptide according to claims 1 to 3 which method 
comprises 

(a) providing a polypeptide according to any one of 
claims 1 to 3 comprising an epitope; 

(b) incubating a cell sample with said polypeptide 
under conditions which allow for a cellular 
immune response such as release of cytokines or 
other mediator or reaction to occur; and 

(c) detecting the presence of said cytokine or 
mediator or cellular response in the incubate. 

17. A pharmaceutical composition comprising a polypeptide 
according to any one of claims 1 to 3 in a suitable carrier 
or diluent. 

18- A composition according to claim 17 for use in the 
treatment or prevention of diseases caused by mycobacteria. 

19. A method of treating or preventing mycobacterial 
disease in an animal or human caused by mycobacteria which 
ejqjress a polypeptide according to claims 1 to 3, which 
method comprises vaccinating or treating an animal or human 
with an effective amoxxnt of said polypeptide. 

20. A method of treating or preventing mycobacterial 
diseases in animals or humans caused by mycobacteria 
containing the polynucleotide of SEQ ID NO: 3 or 4, which 
method comprises vaccinating or treating an animal or human 
with an effective amount of a polynucleotide according to 
claims 4 to 9, or a vector according to claim 11. 

21. A method according to claims 19 or 20 for increasing 
the in vivo susceptibility of mycobacteria to antimicr^ial 
drugs . 
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22, A vaccine conqprieing a normally pathogenic 
mycobacteria, which pathogenicity is mediated in all or in 
part by the presence of the expression of a polypeptide as 
defined in any one of claims 1 to 3, which mycobacteria 
harbours an attenuating mutation in any one of said genes. 

23 . A vaccine according to claim 22 wherein the 
mycobacteria is selected from Mavs, Mptb and Mtb. 



